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The goal of this paper is to develop a general method to establish 
conditional ergodicity of infinite-dimensional Markov chains. Given a 
Markov chain in a product space, we aim to understand the ergodic 
r^ . properties of its conditional distributions given one of the compo- 

nents. Such questions play a fundamental role in the ergodic theory 
of nonlinear filters. In the setting of Harris chains, conditional ergod- 
icity has been established under general nondegeneracy assumptions. 
Unfortunately, Markov chains in infinite-dimensional state spaces are 
rarely amenable to the classical theory of Hlarris chains due to the sin- 
gularity of their transition probabilities, while topological and func- 
tional methods that have been developed in the ergodic theory of 
infinite-dimensional Markov chains are not well suited to the inves- 
tigation of conditional distributions. We must therefore develop new 
measure-theoretic tools in the ergodic theory of Markov chains that 
enable the investigation of conditional ergodicity for infinite dimen- 
sional or weak-* ergodic processes. To this end, we first develop local 
counterparts of zero-two laws that arise in the theory of Harris chains. 
These results give rise to ergodic theorems for Markov chains that ad- 
mit asymptotic couplings or that are locally mixing in the sense of H. 
FoUmer, and to a non-Markovian ergodic theorem for stationary ab- 
solutely regular sequences. We proceed to show that local ergodicity 
is inherited by conditioning on a nondegenerate observation process. 
This is used to prove stability and unique ergodicity of the nonlinear 
filter. Finally, we show that our abstract results can be applied to 
infinite-dimensional Markov processes that arise in several settings, 
including dissipative stochastic partial differential equations, stochas- 
tic spin systems, and stochastic differential delay equations. 
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1. Introduction. The classical ergodic theory of Markov chains in gen- 
eral state spaces has achieved a rather definitive form in the theory of Harris 
chains [34, 30, 29], which provides necessary and sufficient conditions for the 
convergence of the transition probabilities in total variation to an invariant 
measure. While this theory is formulated in principle for any measurable 
state space, it is well known that its applicability extends in practice mainly 
to finite-dimensional situations. In infinite dimension, the transition proba- 
bilities from different initial conditions tend to be mutually singular even in 
the most trivial examples, so that total variation convergence is out of the 
question. For this reason, many infinite-dimensional Markov processes, in- 
cluding stochastic partial differential equations, interacting particle systems, 
and stochastic equations with memory, lie outside the scope of the classi- 
cal theory. Instead, a variety of different approaches, including topological 
[9, 17, 19], functional [26, 18], coupling and duality [24] methods, have been 
employed to investigate the ergodicity of infinite-dimensional models. 

The goal of this paper is to investigate questions of conditional ergodicity 
in infinite dimension. Consider a Markov chain {Xn,Yn)n>o taking values 
in a product space E x F (continuous time processes are considered analo- 
gously). The aim of conditional ergodic theory is to understand the ergodic 
properties of one component of the process {Xn)n>o under the conditional 
distribution given the other component (l^)n>o- Even when the process 
{Xn,Yn)n>o is ergodic, the inheritance of ergodicity under conditioning is 
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far from obvious and does not always hold. The history of such problems 
dates back to an erroneous result of H. Kunita [21], where the inheritance 
of ergodicity was taken for granted (see [43] and the references therein). 
The long-standing problem of establishing conditional ergodicity under gen- 
eral assumptions was largely resolved in [40, 39], where it is shown that 
the inheritance of ergodicity holds under a mild nondegeneracy assumption 
when {Xn,Yn)n>o is a Harris chain. Numerous other results in this area, 
both of a qualitative and quantitative nature, are reviewed in [8]. All these 
results are however essentially restricted to the setting of Harris chains, so 
that their applicability to infinite-dimensional models is severely limited. In 
this paper, we develop the first results of this kind that are generally appli- 
cable beyond the Harris setting and, in particular, that allow to establish 
conditional ergodicity in a wide range infinite-dimensional models. 

To give a flavor of the type of problems that our theory will address, let us 
briefly describe one example that will be given in section 5 below. Consider 
the velocity field u of a fluid that is modeled as a Navier-Stokes equation 

du = {z^An — (n • V)ti — Vp} dt + dw, V • n = 

with white in time, spatially smooth random forcing dw. At regular time in- 
tervals tn = n5 the velocity field is sampled at the spatial locations zi, . . . ,Zr 
with some additive Gaussian noise which yields the observations 

= u(tnj ^i) ~^ Cni i = 1, . . . , r. 

Such models arise naturally in data assimilation problems [38] . The process 
{Xn,Yn)n>o with = u{tn,-) IS an infinite-dimensional Markov chain. 
Classical ergodicity questions include the existence and uniqueness of an 
invariant probability A, and the convergence to equilibrium property 

|E-[/(X„)]-A(/)|^^0 

for a sufficiently large class of functions / and initial conditions x. Such 
questions are far from straightforward for Navier-Stokes equations and have 
formed a very active area of research in recent years (see, for example, [16, 28, 
20]). In contrast, we are interested in the question of conditional ergodicity 

mfiX^a)\3^lJ - B'[f{XrM,oo]\ 

(where 3"^ „ = a{Ym, . . . , Yn}), or, more importantly, its causal counterpart 

mfiXnWl^] - B'[f{Xn)\3^lj\ 
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which corresponds to stabihty of the nonUnear filter vTn = P^[Xn G • |3"o'n]- 
In contrast to convergence to equihbrium of the underlying model, condi- 
tional ergodicity properties yield convergence to equilibrium of the estima- 
tion error of the model given the observations [21] or the long-term stability 
of the conditional distributions to perturbations (such as those that arise in 
the investigation of numerical filtering algorithms), cf. [42]. The interplay be- 
tween ergodicity and conditioning is of intrinsic interest in probability theory 
and in measurable dynamics, where it is closely related to notions of relative 
mixing [35] , and lies at the heart of stability problems that arise in data as- 
similation and nonlinear filtering. The main results of this paper will allow 
us to establish conditional ergodicity in a wide range of infinite-dimensional 
models, including dissipative stochastic partial differential equations such 
as the above Navier-Stokes model, stochastic spin systems, and stochastic 
differential delay equations (detailed examples are given in section 5). 

One of the main difficulties in the investigation of conditional ergodicity 
is that conditioning on an infinite observation sequence is a very sin- 

gular operation. Under the conditional distribution, the unobserved process 
{Xn)n>o remains a Markov chain, albeit an inhomogeneous one with random 
transition probabilities depending on the realized path of the observations 
{Yn)n>Q (in the stationary case this is a Markov chain in a random environ- 
ment in the sense of Cogburn and Orey [7, 31]). These conditional transition 
probabilities are defined abstractly as regular conditional probabilities, but 
no explicit equations are available even in the simplest examples. There is 
therefore little hope of analyzing the properties of the conditional chain "by 
hand," and one must find a way to deduce the requisite ergodic properties 
from their unconditional counterparts. On the other hand, conditioning is 
an essentially measure-theoretic operation, and it is unlikely that the most 
fruitful approaches to ergodic theory in infinite dimension, such as topolog- 
ical properties or functional inequalities, are preserved by the conditional 
distributions. To move beyond the setting of Harris chains, we therefore aim 
to find a way to encode such weak ergodic properties in a measure-theoretic 
fashion that can be shown to be preserved under conditioning. 

A central insight of this paper is that certain basic elements of the classical 
theory admit local formulations that do not rely on the Markov property. 
The simplest of these is a local zero-two law (section 2.1) that characterizes, 
for a given valued Markov chain (X„).„>o and measurable map l : E ^ E' 
to another space E' , the following total variation ergodic property: 

||P"[(^(^fc))fc>n G •] - P"'[(^(X,))fc>„ G •]|| for all x,x' G E. 



If L is injective, then this reduces to the ergodic property of a Harris chain. 
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By choosing different functions i, however, we wih find that such results are 
apphcable far beyond the setting of Harris chains. Let us emphasize that 
when i is not injective the process (t(X„))„>o is generahy not Markov, so 
that our local ergodic theorems are fundamentally non-Markovian in nature. 

In certain cases, this local notion of ergodicity can be applied directly to 
infinite-dimensional Markov chains. When the entire chain does not converge 
to equilibrium in total variation, it may still be the case that each finite- 
dimensional projection of the chain converges in the above sense. To our 
knowledge, this local mixing property was first proposed by H. Follmer [15] in 
the context of interacting particle systems; a similar idea appears in [27] for 
stochastic Navier-Stokes equations with sufficiently nondegenerate forcing. 
By choosing t to be a finite-dimensional projection, we obtain a very useful 
characterization of the local mixing property (section 2.2). Our results can 
also be applied directly to non-Markovian processes: for example, we will 
obtain a non-Markovian ergodic theorem that provides an apparently new 
characterization of stationary absolutely regular sequences (section 2.3). 

While local mixing can be verified in various infinite-dimensional models, 
this generally requires a fair amount of nondegeneracy. In truly degenerate 
situations, we introduce another idea that exploits topological properties of 
the model (section 2.4). In dissipative models and in many other Markov 
chains that converge weakly to equilibrium, it is possible to construct a cou- 
pling of two copies Xn-,X'^ of the chain such that d{Xn^X'^) — )• (cf. [17]). 
Of course, this need not imply any form of total variation convergence. Con- 
sider, however, the perturbed process f{Xn) + r]n where / : ii^ — t- M is a Lips- 
chitz function and {r}n)n>o is an i.i.d. sequence of auxiliary Gaussian random 
variables. When the asymptotic coupling converges sufficiently rapidly, the 
process (/(^n) + ??n)n>o will be ergodic in the above total variation sense by 
the Kakutani theorem. We have thus transformed a topological property into 
a measure-theoretic one, which is amenable to our local ergodic theorems by 
considering the augmented Markov chain (X„, r/„)„>o with i{x, rf) = f{x)+r]. 
The added noise can ultimately be deconvolved, which yields weak-* ergodic 
theorems for the original chain (X„)„>o by purely measure-theoretic means. 

The local ergodic theorems developed in section 2 are of independent 
interest. However, the full benefit of our approach emerges in the develop- 
ment of the conditional ergodic theory that is undertaken in sections 3 and 
4. First, we develop in section 3.1 a conditional counterpart to the local zero- 
two law that characterizes the conditional absolute regularity property of a 
stationary (non-Markovian) sequence. The remainder of section 3 is devoted 
to the inheritance problem. In short, we show that under a generalization 
of the nondegeneracy assumption on the observations that was introduced 
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in [40, 39], the local ergodicity property is inherited when we condition on 
the observed component of the model. Together with the ideas developed in 
section 2, this allows us to obtain various filter stability results in section 4. 
After introducing the relevant setting and notations in section 4.1, we first 
develop a general local filter stability theorem in section 4.2. In section 4.3, 
we give concrete filter stability theorems for Markov chains that are locally 
mixing or that admit asymptotic couplings. We also investigate unique er- 
godicity of the filtering process in the spirit of [21]. Finally, in section 4.4, we 
extend our main results to Markov processes in continuous time. Our general 
approach in these sections is inspired by the ideas developed in [40] in the 
Harris setting. However, as is explained in section 3, the approach used in 
[40, 39] relies crucially on the Markov property, and the same method can 
therefore not be used in the local setting. Instead, we develop here a new 
(and in fact somewhat more direct) method for establishing the inheritance 
property that does not rely on Markov-specific arguments. 

To illustrate the wide applicability of our results, we develop in section 
5 several infinite-dimensional examples that were already mentioned above. 
Our aim is to demonstrate that the assumptions of our main results can be 
verified in several quite distinct settings. In order not to unduly lengthen the 
paper, we have restricted attention to a number of examples whose ergodic 
properties are readily verified using existing results in the literature. 

Let us conclude the introduction by briefly highlighting two directions 
that are not addressed in this paper. First, we emphasize that all the re- 
sults in this paper, which rely at their core on martingale convergence ar- 
guments, are qualitative in nature. The development of quantitative filter 
stability results is an interesting problem, and this remains challenging even 
in finite-dimensional models (cf. [8] and the references therein). Second, let 
us note that the crucial assumption that ensures inheritance of ergodicity 
under conditioning is nondegeneracy of the observations (Assumption 4.3). 
This assumption states, in essence, that the conditional distribution of the 
observations on a finite time interval, given the unobserved process, has a 
transition density. In practice, this implies that while the unobserved pro- 
cess Xn may be infinite-dimensional, the observations must typically be 
finite-dimensional in order to apply our results. While the present setting 
covers a wide range of models of practical interest, conditional ergodicity 
problems with infinite-dimensional observations are of significant interest in 
their own right and require separate consideration. In the latter setting new 
probabilistic phenomena can arise; such issues will be discussed elsewhere. 
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2. Local ergodic theorems. The goal of this section is to develop 
a number of simple but powerful measure-theoretic ergodic theorems that 
are applicable beyond the classical setting of Harris chains [34]. Our main 
tools are the local zero-two laws developed in section 2.1. In the following 
subsections, it is shown how these results can be applied in various different 
settings. In section 2.2, we consider a notion of local mixing for Markov 
chains, due to H. Follmer [15], that provides a natural measure-theoretic 
generalization of Harris chains to the infinite-dimensional setting. In section 
2.3, we obtain an ergodic theorem for non-Markov processes that yields a 
new characterization of stationary absolutely regular sequences. Finally, in 
section 2.4, we show how these results can be combined with the notion of 
asymptotic coupling (see, for example, [17]) to obtain ergodic theorems in 
the weak convergence topology by purely measure-theoretic means. 

Throughout this section, we will work in the following canonical setup. Let 
{E, £) be a measurable space, and let (Xfc)fcgz be the ^^-valued coordinate 
process defined on the canonical path space (0,3"). That is, we define O = 
E^, 3" = and Xk{u}) = uj{k). We define for m < n 

We also define the canonical shift : — )• as @{u}){n) = uj{n + 1). 

We will denote by ViZ) the set of probability measures on a measurable 
space {Z, Z), and for iJ-,i' € y{Z) we denote by — i^Wzo the total variation 
of the signed measure fi — v on the cr-field Zq C 2,, that is, 

11^ - I'ho = 2 sup \n{A) - u{A)\. 

AeZo 

For simplicity, we will write 1^|| = i^H^,- Let us recall that if K, K' are 
finite kernels and if Z is countably generated, then x i— t- \\K{x,-) — K' [x,-)\\ 
is measurable (see, for example, [40, Lemma 2.4]). In this setting, we have 
II j{K{x, ■) - K'{x, ■)}n{dx)\\ < J \\K{x, •) - K'{x, -^fiidx) by Jensen's in- 
equality. These facts will be used repeatedly throughout the paper. 

2.1. Local zero-two laws. Let P : E x E ^ [0; 1] be a transition kernel 
on (£',£), and denote by P'^ be the probability measure on 3"+ such that 
(Xfc)fc>o is Markov with transition kernel P and initial law Xq ~ G '^iE). 
If P is Harris and aperiodic, the Markov chain is ergodic in the sense that 
ll^pn _ ^pn|| ^ ||p^ _ p-||^^^^ for all ^L, u G y[E) 

(cf. [34, Theorem 6.2.2]). Unfortunately, this mode of convergence can be re- 
strictive in complex models. For example, when the state space E is infinite- 
dimensional, such strong convergence will rarely hold: it is often the case in 
this setting that /iP" _L vP"^ for all 77, > (cf. section 2.2). 
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At the heart of this paper hes a simple idea. When total variation con- 
vergence of the full chain fails, it may still be the case that total variation 
convergence holds when the chain restricted to a smaller cj-field C £: that 
is, we intend to establish convergence of i{Xk) where l : {E, £) — )• (£', 8,^) is 
the identity map. As will become clear in the sequel, such local total vari- 
ation convergence is frequently sufficient to deduce convergence of the full 
chain in a weaker probability distance, while at the same time admitting a 
powerful measure-theoretic ergodic theory that will be crucial for the study 
of conditional ergodicity in complex models. 

The key results of this section are a pair of local zero-two laws that char- 
acterize the local total variation convergence of Markov processes. Let us fix 

C £ throughout this section, and define the a-fields 

Jl^^= V X-i(£0), m<n. 

m<k<n 

A central role will be played by the local tail u-field 

n>0 

Finally, for x E E we will denote for simplicity = P''"'. 

It is important to note that the local process i(Xfc) is generally not 
Markov, so that the marginal distribution at a fixed time does not determine 
the future of this process. Thus one cannot restrict attention to the marginal 
distance ||^-P" — i^P"||£o, but one must instead consider the entire infinite 
future IIP'^ — P'^IIto . Of course, when £'^ = £, these notions coincide. 

Theorem 2.1 (Local zero-two law). The following are equivalent. 

1. The Markov chain is locally ergodic: 

||P^ -P^'llyo^ for every e^iE). 

2. The local tail a-field is trivial: 

I"" (A) G {0, 1} for every A e A° and fx e TiE). 

3. The Markov chain is locally irreducible: there exists a > such that 

\/x,x'eE, 3n>0 such that IIP'' - P'^l-ro < 2 - a. 

Zero-two laws of this type appear naturally in the theory of Harris chains 
[34, 11, 32]. It is somewhat surprising that the Markov property proves to 
be inessential in the proof, which enables the present local formulation. 
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Proof of Theorem 2.1. We prove 2 =^ 1 =^ 3 =^ 2. 
(2 =^ 1). Assumption 2 implies that P^(^) = P^iA) for all A e (if 
not, then P''(^) = 1/2 for p = + i^)/2, a contradiction). Therefore 

||pM_pi'|| llpA'-P'^IUo =0. 

(1 =^ 3). This is obvious. 

(3 =^ 2). Assume that condition 2 does not hold. Then there exists A G A^ 
and /X G y(^) such that < P^(^) < 1. Define / = 1^ - l^c, and note that 

E^"[/oe-"] = E''[/|5-o,„]^^/ P^-a.s. 

Define the probability measure Q on $7 x as P'* ® P^, and denote by 
{Xn,X'^)n>o the coordinate process on O x fi. Fix a > 0. Then 

Q[|E^"[/oe-"] -E^"[/oe-T| > 2-a] 2P''(^)P''(^^) > 0. 

Thus there exist iV > and X, G ^ such that |E^[/oe^^]-E^''[/oe~^]| > 
2 — a. But note that |/| < 1 and / o is yi'^-measurable. Therefore, 

||P^ -P^'IIjo^ > ||P^ -P^'lUo > |E^[/oe~^] -E^'[/oe-^]| > 2-a 

for all n > 0. As a > is arbitrary, condition 3 is contradicted. □ 

The characterization in Theorem 2.1 does not require the existence of 
an invariant probability. However, when such a probability exists, we can 
obtain a useful stationary variant of the local zero-two law that will be 
proved next. The advantage of the stationary zero- two law is that it does 
not require uniform control in condition 3. On the other hand, the resulting 
convergence only holds for almost every initial condition. 

Theorem 2.2 (Local stationary zero-two law). Suppose is countably 
generated. Given a P-invariant probability A, the following are equivalent. 

1. The Markov chain is a.e. locally ergodic: 

||P^'_pA|| forX-a.e.x, 

or, equivalently, 

||px_px'|| for X^X-a.e. {x,x'). 

M ii^ri,oo \ ' ^ 

2. The local tail a -field is a.e. trivial: 

P^(^) = P^(A)2 = P^'(A) V^Gyi", X(E) X-a.e. {x,x'). 
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3. The Markov chain is a.e. locally irreducible: 

for X(E) X-a.e. {x,x'), 3n>0 such that ||P^ - P"^' ||gro ^ < 2. 

Proof. The equivalence of the two statements of condition 1 follows from 

||P^-P^||jo < /"||P^-P^'' 11:^0 X{dx'), 

iip^-p^'IIto < iip"^ -p^iIto + iip^' -p^iIto . 

The proofs of 2 =^ 1 =^ 3 are identical to the corresponding proofs in Theo- 
rem 2.1. It therefore remains to prove 3 =^ 2. To this end, define 

/3„(x,x') = ||P"-P"'bo , /3(x,x') = lim /3„(x,x'). 

n,oo ^_^QQ 

The limit in the definition of /3 exists pointwise as fin is decreasing in n. 
Moreover, as is countably generated, the maps /3„ and /3 are measurable. 

Define Q^'^' = V on x 0. Note that /3„ < pointwise 

by Jensen's inequality, so /3 < (P (8) P)/3 by dominated convergence. In 
particular, as A A is P (8> P-invariant, this implies /3 = (P (g) P)/3 A (8> A-a.e. 
Thus there is a measurable set C f2 x with (A ® X){H') = 1 such that 

Q^'^'[/3(x, x') = /3{Xn,X'J for aU n > 0] = 1 for all (x, x') G H'. 

In the remainder of the proof, we assume that condition 3 holds, and we fix 
a measurable set H H' with (A ® X){H) = 1 such that 

V (x, x') £ H, 3n>0 such that /3„(x, x') < 2. 

Suppose condition 2 does not hold. Then there exist A € and {x,x') G H 
such that either < P^(A) < 1 or P^(A) ^ P^'(^). Define / = 1^ - 1a- 
and fix a > 0. Proceeding as in the proof of Theorem 2.1, we find that 

Q^-^'[|E^"[/ o e-"] - o G-"]| > 2 - q] 

P^(A)P^'{A'') + P'^{A'')P^'{A) > 0. 

Note that as |/| < 1 and / o Q~" is Tl'^-measurable, we have 

|E^"[/oe-"] -E^"[/oe~T| < l3{Xn,X'^) = f3{x,x') Q^'^a.s. 

It follows that I3{x,x') > 2 — a, and we therefore have f3{x,x') = 2 as 
a > was arbitrary. But by construction there exists n > such that 
j3{x,x') < j3n{x,x') < 2, and we have the desired contradiction. □ 

Theorems 2.1 and 2.2, while elementary, play a fundamental role in our 
theory. In the following subsections, we will see that these results have a 
broad range of applicability that goes far beyond the setting of Harris chains. 
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2.2. Local mixing in infinite dimension. Markov chains in an infinite- 
dimensional state space are rarely amenable to the classical theory of Harris 
chains. The key obstacle is that total variation convergence requires non- 
singularity of the transition probabilities. This is not restrictive in finite 
dimension, but fails in infinite dimension even in the most trivial examples. 

Example 2.3. Let {Xk)k>o be the Markov chain in { — 1, +1}^ such that 
each coordinate (X^)fc>o is an independent Markov chain in { — 1,+1} with 
transition probabilities < P-i^+i = P+i-i < 1/2. Clearly each coordinate 
is a Harris chain, and the law of Xn converges weakly as n — )■ oo to its 
unique invariant measure A for any initial condition. Nonetheless 6inP^ and 
A are mutually singular for all n > (5^nP" and A possess i.i.d. coordinates 
with a different law), so Xn cannot converge in total variation. 

As the classical measure-theoretic theory fails to yield satisfactory re- 
sults, the ergodic theory of infinite-dimensional Markov chains is frequently 
approached by means of topological methods. A connection between topo- 
logical methods and local zero-two laws will be investigated in section 2.4 
below. On the other hand, one may seek a purely measure-theoretic counter- 
part of the notion of a Harris chain that is adapted to the infinite-dimensional 
setting. We now describe such a notion due to H. Follmer [15]. 

Throughout this section, we adopt the same setting as in section 2.1. 
To formalize the notion of an infinite-dimensional Markov chain, we assume 
that the state space {E, £) is contained in a countable product: that is, there 
exist a countable set I and measurable spaces {E^, £*) such that 

(£;,£) c^(i?^£^). 

Each i £ I plays the role of a single dimension of the model. We will write 
X = {x^)i^i for X £ E, and for J C / we denote by x"^ = (x*)jgj the natural 
projection of x onto Hie J m < n, we define the quantities X^ „ and 

3^m,n the obvious manner. Moreover, we define the local tail u-fields 

n>0 \J\<oo 

That is. Aloe is generated by the asymptotic events associated to all finite- 
dimensional projections of the infinite-dimensional chain. 

We now introduce Follmer's notion of local mixing, which states that each 
finite-dimensional projection of the model converges in total variation. Let us 
emphasize, as in the previous section, that the finite-dimensional projection 
of an infinite-dimensional Markov chain is generally not Markov. 



12 



XIN THOMSON TONG AND RAMON VAN HANDEL 



Definition 2.4 (Local mixing). A Markov chain {Xk)k>o taking values 
in the countable product space {E, £) C Ylieii^^^ S*) is locally mixing if 

IIP'' -P'^llgrj^ ^^-^ for all G and J C /, |J| < oo. 

In the finite-dimensional case |/| < oo, this definition reduces to the 
ergodic property of Harris chains. Moreover, in the infinite-dimensional set- 
ting, Follmer [15] proves a characterization of local mixing in complete anal- 
ogy with the Blackwell-Orey equivalence in the theory of Harris chains [34, 
Ch. 6]. It therefore appears that local mixing is the natural measure-theoretic 
generalization of the Harris theory to the infinite dimensional setting. 

Unfortunately, the characterization given in [15] is of limited use for the 
purpose of establishing the local mixing property of a given Markov chain: 
only a very strong verifiable sufficient condition is given there (in the spirit 
of the Dobrushin uniqueness condition for Gibbs measures). The missing 
ingredient is a zero-two law, which we can now give as a simple corollary of 
the results in section 2.1. This completes the characterization of local mixing 
given in [15], and provides a concrete tool to verify this property. 

Corollary 2.5 (Local mixing theorem). The following are equivalent. 

1. (Xfc)fc>o is locally mixing. 

2. Ax oc is -trivial for every /i € iP(ii/). 

3. For every J ^ I, \J\ < oo, there exists q > such that 

yx,x'eE, 3n>0 such that ||P^ - P''''|Uj < 2 - a. 

Proof. Note that Aioc is P^-trivial if and only if A'^ is P^-trivial for all 
I J| < oo. Thus the result follows immediately from Theorem 2.1. □ 

Condition 3 of Corollary 2.5 can be used directly to verify the local mix- 
ing property in infinite-dimensional models that possess a sufficient degree 
of nondegeneracy. For example, in the setting of stochastic Navier-Stokes 
equations with additive noise (cf. section 5.2), the approach developed in 
[13, 27] can be used to show that Condition 3 holds under the assumption 
that every Fourier mode is forced by an independent Brownian motion (in 
this setting, each dimension i £ I corresponds to a single Fourier mode 
of the system). However, in degenerate models (for example, where some 
modes are unforced or when the noise is not additive), local mixing may be 
difficult or impossible to establish. In section 2.4 below, we will introduce a 
technique that will significantly extend the applicability of our results. 
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Remark 2.6. One can of course also obtain a stationary counterpart 
of Corollary 2.5 by applying Theorem 2.2 rather than Theorem 2.1. As the 
result is essentially identical, we do not state it explicitly. 

2.3. Ergodicity of non-Markov processes. As the local zero-two laws in- 
troduced in section 2.1 are essentially non-Markovian, they can be used to 
investigate the ergodic theory of non-Markov processes. Let us illustrate this 
idea by developing a new (to the best of our knowledge) characterization of 
stationary absolutely regular sequences. 

In this section, we assume that {E, £) is a Polish space (the Polish assump- 
tion is made to ensure the existence of regular conditional probabilities), and 
let P be a stationary probability measure on (r2,3"). Let us recall the well- 
known notion of absolute regularity [45] (sometimes called /3-mixing) . 

Definition 2.7 (Absolute regularity). A stationary sequence {Xk)k£'i 
is said to be absolutely regular if the following holds: 

||P[x_oo,o,^fc,oo G •] -P[x_oo,o e ■]0P[Xk,oo e - 111 ^^0. 

We obtain the following characterization. 

Corollary 2.8. Let (Xfc)^^^ be a stationary sequence. Choose any ver- 
sion P^-oo>o ^/jg regular conditional probability P[-|9"„], and define the 
measure P^ = P[X_oo,o £ "]■ The following are equivalent: 

1. {Xk)kez is absolutely regular. 

2. IIP-— >o _ p||^^_^ for p--a.e. x.oo.o- 

3. For P~ ® P~-a.e. (x_oo,05 2;-oo,o)) there exists k > such that p--°o.o 
and p--°°.o are not mutually singular on 3'k,oo- 

Proof. It is standard (for example, [45]) that {Xk)k£Z is absolutely reg- 
ular if and only if E[||P[Xfc,oo G • |3"-] - 'P[Xk,oo G •]||] as A: oo. 
However, as ||P[Xfc,oo G • |3"-] — P[^fc,oo € •]|| is pointwise decreasing in k, 
the equivalence between conditions 1 and 2 follows immediately. 

Now define the E'^^-valued process Zk = X_oo,fc- Then {Zk)kez is clearly a 
Markov chain with transition kernel Q{z, A) = P^[(z, Xi) G A] and invariant 
probability P". We apply Theorem 2.2 to the Markov chain {Zk)k£z, where 
in Theorem 2.2 is the cr-field generated by the first coordinate of E^" . 
This yields immediately the equivalence between conditions 2 and 3. □ 

While conditions 1 and 2 of Corollary 2.8 are standard, condition 3 ap- 
pears at first sight to be substantially weaker: all that is needed is that, for 
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almost every pair of initial histories, we can couple the future evolutions with 
nonzero success probability. This is reminiscent to the corresponding result 
for Harris chains, and one could argue that absolutely regular sequences pro- 
vide a natural generalization of the Harris theory to non-Markov processes. 
In this spirit, Berbee [1] has shown that absolutely regular sequences admit 
a decomposition into cyclic classes much like in the Markov setting. 

The elementary observation used in the proof of Corollary 2.8 is that any 
non-Markov process can be made Markov by considering the history pro- 
cess Zk = X_oo,fc- However, the process Z/^ is highly degenerate: its transition 
probabilities are mutually singular for any distinct pair of initial conditions. 
For this reason, the classical Harris theory is of no use in investigating the 
ergodicity of non-Markov processes; the local nature of the zero-two laws 
developed in section 2.1 is the key to obtaining nontrivial results. 

Remark 2.9. Along the same lines, one can also obtain a counterpart of 
Theorem 2.1 for non-Markov processes. The latter is useful for the investiga- 
tion of delay equations with infinite memory. As no new ideas are involved, 
we leave the formulation of such a result to the reader. 

2.4. Weak convergence and asymptotic coupling. In the previous sec- 
tions, we have employed the local zero- two laws directly to obtain ergodic 
properties in the total variation distance. However, even local total variation 
convergence is still too strong a requirement in many cases of interest. In this 
section, we introduce a technique that allows us to deduce ergodic proper- 
ties of weak convergence type from the local zero-two laws. This significantly 
extends the range of applicability of our techniques. 

Throughout this section, we adopt the same setting as in section 2.1. We 
will assume in addition that the state space E is Polish and is endowed with 
its Borel a- field £ and a complete metric d. Denote by U},{E) the uniformly 
continuous and bounded functions on and let Lip(£') be the class of 
functions / G Ub{E) such that ||/||oo < 1 and \f{x) — f{y)\ < d{x,y) for 
all x,y G E. Let M(£') be the space of signed finite measures on E, and 
define the bounded-Lipschitz norm ||£'||bl = ^'^P f&Up{E) Ifffl for g £ M.{E). 
We recall for future reference that x i— )■ •) — K'{x, •)||bl is measurable 

when K,K' are finite kernels; see, for example, [41, Lemma A.l]. 

A coupling of two probability measures Pi, P2 on 17 is a probability mea- 
sure Q on $7 X such that the first marginal of Q coincides with Pi and 
the second marginal coincides with P2. Let us denote the family of all cou- 
plings of Pi,P2 by C(Pi,P2). To set the stage for our result, let us recall 
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the coupling characterization of the total variation distance: 

l|Pi - P2||?„.«> = 2min{Q[X„,oo + J : Q G e(Pi,P2)}. 

Consider for simplicity the classical zero-two law (Theorem 2.1 for = 
Its basic condition reads: there exists a > such that 



Vx,x'eS 3n>0 such that HP"" - P'^' || j„ ^ < 2 



a. 



By the coupling characterization of the total variation distance, this condi- 
tion can be equivalently stated as follows: there exists a > such that 



Vx,x'g^ 3QGe(P^,P^') such that Q 



n=0 



> a. 



The message of the following theorem is that if one replaces the discrete dis- 
tance by the topological distance X^), one obtains an ergodic 
theorem with respect to the bounded-Lipschitz (rather than total variation) 
distance. This is a much weaker assumption: it is not necessary to construct 
an exact coupling where X„ , X'^ eventually coincide with positive probabil- 
ity, but only an asymptotic coupling where X„ , X'^^ converge towards each 
other. The latter can often be accomplished even in degenerate situations. 

Theorem 2.10 (Weak-* ergodicity). Suppose there exists a > so that 
yx,x'eE 3QGe(P^,P^') such that Q ^d{Xn, X'J^ < oo > a. 

.n=0 

Then the Markov chain is weak-* ergodic in the sense that 

- z/P"||bl for every e ^iE). 

It is interesting to compare Theorem 2.10 to the weak-* ergodic theorems 
obtained in [17, section 2.2] in terms of asymptotic coupling. In contrast to 
those results. Theorem 2.10 requires no specific recurrence structure, Marko- 
vian couplings, or control on the coupling probability a as a function of x,x' . 
On the other hand. Theorem 2.10 requires the asymptotic coupling to con- 
verge sufficiently quickly so that ^ci(X„,X4)^ < oo (this is not a serious 
issue in most applications), while the results in [17] are in principle appli- 
cable to couplings with an arbitrarily slow convergence rate. These results 
therefore provide complementary conditions for weak-* ergodicity. 

However, it should be emphasized that the feature of Theorem 2.10 that 
is of key importance for our purposes is that its proof reduces the problem to 
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the local zero-two law of section 2.1. Using this technique, we can therefore 
extend the applicability of purely measure-theoretic results that are based 
on zero-two laws to a wide class of weak-* ergodic Markov chains. This idea 
will be crucial to establishing conditional ergodicity in degenerate infinite- 
dimensional models (see section 5 for examples). 



Proof of Theorem 2.10. Let {E,E.) = {ExR,S.(^'B{R)). Consider the 
^-valued process {Zn)n>o (defined on its canonical probability space) such 
that Zn = {Xn,S,n), where {Cn)n>o is an i.i.d. sequence of standard Gaussian 
random variables independent of the Markov chain (X„)„>o. Clearly (Z„)„>o 
is itself a Markov chain. Given / G Lip(i?), we will apply Theorem 2.1 to 
{Zn)n>o with the local u-field £° = a{g}, g{x,y) = f{x) + y. 

We begin by noting a standard estimate. 



Lemma 2.11. Let (^n)n>o be an i.i.d. sequence of standard Gaussian 
random variables, and let (an)n>o ^^'d (&n)n>o be real-valued sequences. Then 



\P[{an + Cn)n>0 G " ] " P [(^n + Cn)n>0 ^ ■]f < ^{bn 



n=0 



Proof. Denote by H{^,v) = J ^Jd^ dv the Kakutani-Hellinger affinity 
between probability measures /i, v. We recall that [36, section IILQ] 



® Vn\\ < 



k=l 



But a direct computation shows that H{N{a, 1), N{b, 1)) = exp(— (&— a)^/8). 
The result now follows directly using 1 — e~^ < x and ?i — )• oo. □ 

Fix x,x' G E and / G Lip(i?), and choose Q G e(P'',P^') as in the 
statement of the theorem. By assumption, we can choose ?i > such that 



Q 



k=n 



3a 
> — . 

- 4 



Let Fn = f{Xn) +^n; and define for every real-valued sequence a = (on)„>o 
the measure = P[(an + in)n>a G • ]• Then we have for every A G 23(M)^+ 

P^[F.,oo e A] - P^-'[F„,o, G A] = EQ[/i(^(,^,)),^ JA) - A^(/(x'.)),>„(^)]. 
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Therefore, we obtain by Jensen's inequality and Lemma 2.11 



|P^[i^n,oo G •] -P"'[i^n,oo G - 111 <Eq 



^{f{X,) - f{X',)}' 



1/2 



A2 



< E 



, k=n 

oo 



1/2 



Q 



a 



A2 



, k=n 

^ 3a 
~ T 



a, 



where we have used the Lipschitz property of /. Applying Theorem 2.1 as 
indicated at the beginning of the proof, it follows that 



In particular, if we denote by ^ G 



^:^^0 for all ^i,z^ E y(^). 
the standard Gaussian measure, then 
ll^pn j"i ^ ^ _ ^pnf~i ^ for ah ViE) 

(here * denotes convolution). We claim that this implies 

l/xP"/ - for all :P{E) and / G Lip(^). 

Indeed, if we assume the contrary, then there exists for some / € Lip(£') 
and iJi,^ e 'J'{E) a subsequence m„ f oo so that inf„ \nP"^"f - > 0. 

As / takes values in the compact interval [—1,1], we can extract a further 
subsequence /c^ t so that fiP^"f~^ — )• and i/P^"f~^ — )• u^o in the weak 
convergence topology for some ^ooi^'oo G and clearly ^^o 7^ ^'oo 

by construction. On the other hand, as * ^ — uP"-f~^ * ^\\ — ;> 0, 

we must have /Uqo * ^ = z^oo * This entails a contradiction, as the Fourier 
transform of ^ vanishes nowhere (so convolution by ^ is injective on CP(R)). 
We finally claim that in fact 

ll^pn _ uP^BL for all fi,iy€ ?(£;). 

Indeed, we have shown above that the signed measure Qn = IJ^P^ — vP^ 
converges to zero pointwise on Lip(£'). As any function in Ub{E) can be 
approximated uniformly by bounded Lipschitz functions, this implies that 
Qn in the a{M{E), Ub{E))-topology. A resuh of Pachl [33, Theorem 3.2] 
now implies that ||iJn||BL — ^ 0, which concludes the proof of the theorem. □ 
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For simplicity, we have stated the assumption of Theorem 2.10 so that an 
asymptotic coupUng of the entire state Xj, of the Markov chain is required. 
The reader may easily adapt the proof of Theorem 2.10 to require only 
asymptotic couplings of finite-dimensional projections as in the local mixing 
setting (Corollary 2.5), or to deduce a variant of this result in the setting of 
non-Markov processes. However, let us emphasize that even in the setting of 
Theorem 2.10, where the asymptotic coupling is at the level of the Markov 
process X^-, the "smoothed" process = f{Xk) + ^/c that appears in the 
proof is non-Markovian. Therefore, the local zero-two law is essential in order 
to obtain weak-* ergodicity results from the total variation theory. 

The stationary counterpart to Theorem 2.10 also follows along the same 
lines. However, here a small modification is needed at the end of the proof. 

Theorem 2.12 (Stationary weak-* ergodicity). Let X be a P -invariant 
probability. Suppose that for A X-a.e. {x,x') £ E x E, 



3QGe(P^,P^) such that Q 



> 0. 



.71=0 

Then the Markov chain is a.e. weak-* ergodic in the sense that 
||P"(2;,-) -A||bl ^^^0 for X-a.e. X e E. 



Proof. Repeating the proof of Theorem 2.10 using Theorem 2.2 instead 
of Theorem 2.1 yields the following: for every / S L[p{E), we have 



|P"/(a;) - A/I for A-a.e. x. 

We would like to extend this to convergence in the bounded-Lipschitz norm. 
This does not follow immediately, however, as the A-null set x G E for 
which the convergence fails may depend on / € Lip(ii^). 

Fix e > 0. Let K <^ E he a compact set such that X{K) > 1 — e, and 
define x{^) = (1 ~ e'~^d{x, K))^. Then we can estimate 

||P"(x, •) - AIIbl < sup |P"(/x)(:r) - A(/x)| + P"(l - x){x) + A(l - x) 
feUp{E) 

< sup \P^{fx){x)-X{fx)\ + \P''x{x)-Xx\+2e. 
feUp{E) 

By the Arzela-Ascoli theorem, we can find a finite number of functions 
G Lip{E) such that supjgLip(£;) mini - /l-ftrlU < £• But 
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note that \ f{x) — g{x)\ < 2e + Wflx — Q'^-kWoo whenever d{x,K) < e and 
G Lip(£'). Therefore, supjg:Lip(£;) mini WfiX - fxWoo < 3e, and we have 

||P"(x,-)-A||bl< max - A(/a)| + |P"x(^) - Ax| + 8e. 

i=l,...,k 

As the quantity on the right-hand side depends only on a finite number of 
bounded Lipschitz functions Xi /iX) • • • ) fkXi we certainly have 

limsup •) — A||bl < 8e for A-a.e. x. 

n—^OD 

But e > was arbitrary, so the proof is complete. □ 

Remark 2.13. The tightness argument used here is in fact more ele- 
mentary than the result of Pachl [33] used in the proof of Theorem 2.10. 
Note, however, that Theorem 2.10 does not even require the existence of an 
invariant probability, so that tightness is not guaranteed in that setting. 

3. Conditional ergodicity. In the previous section, we have developed 
various measure-theoretic ergodic theorems that are applicable in infinite- 
dimensional or non-Markov settings. The goal of the present section is to 
develop a conditional variant of these ideas: given a stationary process 
(Zfc,yfc)fc£2, we aim to understand when {Zk)i^^i is ergodic conditionally 
on {Yk)k£Z- The conditional ergodic theory developed in this section will be 
used in section 4 below to prove stability and ergodicity of nonlinear filters. 

In section 3.1, we first develop a conditional variant of the zero- two laws 
of the previous section. In principle, this result completely characterizes the 
conditional absolute regularity property of (^fc)fegz given (Ifc)fegz- Unfortu- 
nately, the equivalent conditions of the zero-two law are stated in terms of 
the conditional distribution P[Z G • |3"^]: this quantity is defined abstractly 
as a regular conditional probability, but an explicit expression is almost 
never available. Therefore, in itself, the conditional zero-two law is very dif- 
ficult to use. In contrast, the (unconditional) ergodic theory of {Zk,Yk)k^z 
can typically be studied by direct analysis of the underlying model. 

The question that we aim to address is therefore that of inheritance: if 
the unconditional process iZk,Yk)kez is absolutely regular, does this imply 
that the process is also conditionally absolutely regular? In general this is 
not the case even when the process is Markov (cf. [43]). However, under a 
suitable nondegeneracy requirement, we will be able to establish inheritance 
of the absolute regularity property from the unconditional to the conditional 
process (section 3.2). Using an additional argument (section 3.3), we will also 
deduce conditional ergodicity given the one-sided process (yfc)fc>o rather 
than the two-sided process (Yfe)fcGZ) as will be needed in section 4 below. 
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The inheritance of the ergodicity property under conditioning was first 
estabHshed in the Markov setting in [40, 39]. For a Markov process {Xk)k>o, 
the condition of the zero-two law states that for a.e. initial conditions x,x', 
there exists n > such that P"'{x, •) and P"(x', •) are not mutually singu- 
lar, while the conditional zero- two law yields essentially the same condition 
where the transition kernel P{x,-) = P^[Xi G • ] is replaced by the condi- 
tional transition kernel P{x, •) = P^[Xi G • |3"^]. The key idea in the proof 
of inheritance was to show that the unconditional and conditional transition 
kernels are equivalent P{x, ■) ~ P{x, •) a.e., which immediately yields equiv- 
alence of the conditions of the unconditional and conditional zero-two laws. 
Unfortunately, such an approach cannot work in the non-Markov setting, as 
here the corresponding argument would require us to show the equivalence 
of laws of the infinite future P[Zk^oo G • |3"-oo,o] ^^'^ P[^fc,oo £ • VJ^^q g]- 
Such an equivalence on the infinite time interval cannot hold except in triv- 
ial cases (even in the Markov setting). Instead, we develop a new method to 
establish inheritance of the conditions of the unconditional and conditional 
zero-two laws that avoids the Markov-specific argument used in [40, 39]. 

Throughout this section, we adopt the same setting and notations as in 
section 2. Here we will assume that E = G x F where G, F are Polish spaces, 
and we fix a stationary probability P on (17,3"). We denote the components 
of the coordinate process as X„ = y„). Thus {Zn,Yn)ne'E is a stationary 
process in G x F defined on the canonical probability space (r2,3", P). Let 

SO that 3"m,n = 3~m n ^ n- ^c also define the tail u-field 

n>0 

For simplicity, we will write Z — Z^qq qq and y = y_ 

00,00 ■ We introduce the 
convention that for a sequence z = {zn)n£Z, we write Z- = (zn)n<o- 

3.1. A conditional zero-two law. In this section, we establish a condi- 
tional counterpart to Corollary 2.8 that characterizes the absolute regularity 
property of {Zn)nez conditionally on (Yn)nez- While it is difficult to apply 
directly, this conditional zero-two law plays a fundamental role in the study 
of conditional ergodicity to be undertaken in the following sections. 

In the following, we define the probability measure Y and fix versions of 
the regular conditional probabilities Py, Py~ as follows: 

Y = p[yG-], Py = p[zg -[3-^], pj- =P[Zg -[J^ VJ^^^o]- 
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The following is the main result of this section. 

Theorem 3.1 (Conditional 0-2 law). The following are equivalent: 

1. For (Py (g) Pj,)Y-a.e. {z,z',y), we have 

2. For {Py 



pf,--P,rllyz ^^0. 



y ^ V 
Py)Y-a.e. {z,z',y), we have 



\- (A) for all AeA^ 



P-(A) = P-(^)^ 

3. For (Py (g) Pj^)Y-a.e. {z,z',y), there exists n > such that Py~ and 
Py~ are not mutually singular on 3^noo- 

The proof of Theorem 3.1 is similar in spirit to that of Theorem 2.2. How- 
ever, some care is needed in the handling of regular conditional probabilities. 
We begin by establishing the following stationarity result. 

Lemma 3.2. For P-a.e. {z,y), we have 



(A)=E^-[l^oe"|Jf^J P;--a.s. for every AG J' 
Proof. Fix AeS"^, B e „, and C G 3"^ V 9"^^ q- Then 



E 



,z_oe" 

0ny 



(A)l 



B 



E Pe„y® {A) l^nc 

E p^-(^){woe-'^} 



E[l^{lBnc;oe^"} 
E[{lAoe"}lBnc; 
e[e5-[{1aoG"}1b]1c 



where we have used the stationarity of P and the definition of the regular 
conditional probability P^"- As this holds for all C E 3"^ V 9":?oo Q' have 



for P-a.e. {z,y). But as 3"^ and 3":?oo „ are countably generated, we can 
ensure that for P-a.e. {z, y), this equality holds simultaneously for all sets A 
and i? in a countable generating algebra for 3"^ and 3^?.^ respectively. By 
the monotone class theorem, it follows that for P-a.e. {z, y) the equality holds 
for all A E 3"^ and B E 3"^oo n simultaneously, which yields the claim. □ 
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In the following, we define 

I3n{z-,z'^,y) = \\Pl- -V^-^Wyz , (3{z-,z'_,y) = lim /3„(z_., z^, y). 

The limit in the definition of /3 exists pointwise as /3„ is decreasing in n. As 
3'noo is countably generated, the maps (3n and /3 are measurable. 
Define the probability Q = (P^ ® Fy)Y on x x F^, that is, 

y lA(^,/,y)Q(d^,d/,d2/) = y lAiz,z',y)Py{dz)Fy{dz')Y{dy). 

Denote the coordinate process on G^ x G^ x as Z!^,Yn)n&z- Evidently 
Q is a coupling of two copies of P such that the observations Y coincide and 
Z, Z' are conditionally independent given Y. Moreover, as Y is stationary 
and Pyoe(^) = P[^|^^] 06 = B[1a o e|3"^] = By{1a ° ©) P-a.s., it is 
easily seen that Q is also a stationary measure. Finally, define 

Q^-'^- = p^- Py- ® 5y on X X F^. 

Z— z' 

It follows directly that Qy ' ~ is a version of the regular conditional prob- 
ability Q[ • V 3-^^,0 V 3"!:^,o], where = a{Z'^^^}. 
The following lemma establishes the invariance of /3. 

Lemma 3.3. For Q-a.e. we /laue 

Q^"'^-[/3(z_,z^,2/) = oe",Zi oe",e"y) /or aZZ n > 0] = 1. 

Proof. Define for simplicity 9n = 3"^ V 9"?oo,n V 3"?'oo,n- By Jensen's 
inequality and conditional independence, we obtain 

/3fc(z_,zi,y) = ||Q[z,,oo G -ISo] - Q[4,oo e -ISolll 

< Eq[ ||Q[Z,, ^ G • ISi] - Q[^^,oo G • ISi]|| ISo] 

= Eq[ ||Q[z,,_i,oo G • 190] - Q[4„i,oo G • ISo]|| o e ISo] 
= EQ[/3fc_i(z_,zi,y)oe|go]. 

Letting /c — t- 00 and using stationarity, we find that M„ = /3(Z_ , Z'_ , y) o 0" 
is a bounded submartingale under Q. Using stationarity, we find that 

Eq[|/3(z_, zi, F) - zi, y) o e"|] = 

The result now follows by disintegration. □ 
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We are now ready to complete the proof of Theorem 3.1. 

Proof of Theorem 3.1. The proofs of 2 =^ 1 3 are identical to the 
corresponding proofs in Theorem 2.1. It remains to prove 3 =^ 2. 

Let us assume that condition 3 holds. Then, using Lemmas 3.2 and 3.3, 
we can find a measurable subset H C x x of full probability 
Q(i7) = 1 such that the following hold for every {z,z',y) G H: 

a. Py~ and Py~ are not mutually singular on 3'^ oo for some n > 0. 

b. Pl-f"{A) = B'y- [1a o e"| J^^_J P^--a.s. for ah .4 G J^, n > 0. 

c. Pe;°®"(A) = By- [U o e"| J^^^J P^--a.s. for all A e , n > 0. 

d. Ql"''~[P{z_,z'_,y) = oG",Zi oe",e"?/) for ah n > 0] = 1. 

Now suppose that condition 2 does not hold. Then we can choose a path 
(z, z' ,y) € H and a tail event A G such that 

either < P^- (A) < 1, or P^- (A) / P^" (A). 
Define f = 1a — Ia" and fix a > 0. By the martingale convergence theorem, 



Q^''' |E^;f"(/oe"")-E5f"(/oe"")| >2-a 



n— >oo 

> 



Pl-{A)Py-{A^) + Pl-{A^)Py-{A)>^. 
Note that as |/| < 1 and / o is yi^-measurable, we have 

|E^;f (/oe-")-Ee;^" (/oe-")| </3(z_oe",zioe",G"y) 



-a.s. 



It follows that f3{z^, z'_,y) > 2 — a, and we therefore have /3{z-, z'_,y) = 2 
as a > was arbitrary. But by construction there exists n > such that 
/3{z-, z'_,y) < /3n{z- , z'_ , y) < 2, which entails the desired contradiction. □ 

3.2. Nondegeneracy and inheritance. While Theorem 3.1 in principle 
characterizes completely the conditional absolute regularity property, this 
result is difficult to apply directly as an explicit description of Py~ (beyond 
its existence as a regular conditional probability) is typically not available. 
On the other hand, many methods are available to establish the absolute 
regularity property of the unconditional model {Z, Y). We will presently de- 
velop a technique that allows us to deduce the conditional absolute regularity 
property from absolute regularity of the unconditional model. 
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The essential assumption that wih be needed for inheritance of the ab- 
solute regularity property is nondegeneracy of the observations. Roughly 
speaking, this requirement ensures that we cannot infer with certainty the 
outcome of any unobserved event in 3"^ given a finite number of observa- 
tions Ym, . . . ,Yn. The nondegeneracy assumption is typically easy to verify 
in practice from the model description (see section 4, for example). 

Definition 3.4 (Nondegeneracy). The process {Zk-,Yk)k<=,i is said to be 
nondegenerate if for every — oo<?7i<n<oowe have 

P[^5^,n £ ■ |3~— oo,m— 1 V 3'n+l,oo] ^ 'P[Ym,n G " |3"^oo,m— 1 ^ -^^^j^^o^] P-a.S. 

We now state the main result of this section (recall that the definition of 
absolute regularity was given as Definition 2.7 above). 

Theorem 3.5 (Inheritance of absolute regularity). Suppose that the sta- 
tionary process {Zk,Yk)k&z is absolutely regular and nondegenerate. Then 
any (hence all) of the conditions of Theorem 3.1 hold true. 

To prove this result, we need three lemmas. The first lemma is a basic 
result on the equivalence of regular conditional probabilities. 

Lemma 3.6. Let Hi, H2, be Polish spaces, and let H be a probability 
on Hi X H2 X H3. Define Xi{xi,X2,X3) = Xi and 9i = (^{Xi}. Then 

R[Xi G -132 vgs] ~R[Xi G -ISa] R-a.s. 

if and only if 

Proof. This follows from [40, Lemma 3.6] and the existence of a measur- 
able version of the Radon-Nikodym density between kernels [10, V.58]. □ 

The second is essentially the tower property of conditional expectations. 

Lemma 3.7. Let Xi,X2,Xz be random variables taking values in Polish 
spaces Hi, H2, H^, respectively, and define the a-fields Sj = a{Xi}. More- 
over, let K : H2 X 23(ffi) — )• [0, 1] be a transition kernel, and suppose that 

P[XiG-|S2Vg3]~K(X2, •) P-a.S. 

Then we also have 



P[Xi G -192] ~i^(X2, •) P-a.S. 
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Proof. Using existence of a measurable version of the Radon-Nikodym 
density between kernels [10, V.58], the assumption implies that there is a 
measurable function h : Hi x H2 x H-^ — t- ]0, cx3[ such that 

P[Xi G ^192 VSa] = J lAix)h{x,X2,X3)K{X2,dx) P-a.s. 

for every A G !B(ffi). By the tower property 

P[^iG^|92] = jlAix) j h{x,X2,x')Px2idx')KiX2,dx) P-a.s., 

where we fix a version of the conditional probability Pxj = P[^3 G • |92]- 
As !B(i?i) is countably generated, the P-exceptional set can be chosen in- 
dependent of yl by a monotone class argument. This yields the claim. □ 

The third lemma is a total variation bound for conditional probabilities. 

Lemma 3.8. Let Hi, H2, be Polish spaces, and let Ti be a probability 
on Hi X H2 X H3. Define Xi{xi,X2,X3) = Xi and 9i = Then 

Er[||R[Xi g • 192 V 93] - R[^i G • |93]||] 

<2Er[||R[Xi,X3 G •|92]-R[^i,X3 G -111. 

Proof. We fix versions of the regular conditional probabilities Rx2,X3 = 
R[^i G • 192 V 93], = R[^i G • 193], and R^^ = R[Xi,X3 G • |92]', and 
let R^2 = R^2[^g g Y'wst, we claim that for R-a.e. X2 

R"2(^) = j lA{xi,x:i)n^,^^,{dxi)fC^{dx3) for ah A G 9i V 93- 

Indeed, as E[A|92] = E[E[^|92 V 93]|92], the statement holds a.s. for ah A 
in a countable generating algebra for 9i V 93, and the claim follows by the 
monotone class theorem. Now let Yi^^ {dxi^dx^) = Yixzidxi)^^'^ {dx^) . Then 

j ||Ra;2,a;3— Rxall R^^(da;3) = ||R^^— Rq^|| < ||R^^ — R|| vg3 + ||R^^ — R|| Ss • 
The proof is easily completed by integrating both sides. □ 
We now proceed to the proof of Theorem 3.5. 
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Proof of Theorem 3.5. The nondegeneracy assumption states 

Therefore, by Lemma 3.7, we obtain 

P[yi,„_i G • |3-_oo,o V Jl^] ~ P[yi,„_i G • 13-^00,0 V 3-loo] P-a.s. 
It follows that 

P[n,n-1 G • |3"-oo,0 V J„,oo] ~ P[Yl,n-l G • | J_oo,0 V J P-a.S., 

which yields using Lemma 3.6 

P[^n,oo G • I V J^^,o] ~ P[^n,oo G • | J_oo,0 V 3^^^] P-a.S. 

Therefore, if we choose any version Py^ = P[Z G • |3"^oo o ^"n.oo V 3"^oo o]' 
Prk^~P^7J^f,. ^^^1 forP-a.e. 

(note that we define P^"^ as a function of the entire path y for simplic- 
ity of notation; by construction, P^~ depends on y^oo,o,yn,oo, Z- only). By 
condition 3 of Theorem 3.1, to complete the proof it suffices to show that 

z' 

inf ||Py7n - ^y7nhl^ < 2 for Q-a.e. iz,z',y), 

where we define Q = (Py Pj/)Y as in section 3.1. 

Fix versions of the regular conditional probabilities Py,n = ^[Z G • |9"^oo] 
and P^- = P[. |3~„c>o,o] (where X = {Z,Y)). By Lemma 3.8, we have 



E, 



Q 



|P^--P^-|| 



< 2E 



|Py,?^ P^:"lljz^ 



< 4E IIP 



As X = (Z, Y) is absolutely regular. Corollary 2.8 gives 



E, 



Q 



inf P 



Y,n ^Y,n\\sz 



< 4 inf E [IIP 



0. 



Thus the requisite property is established. 



□ 
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3.3. One-sided observations. Theorem 3.1 establishes conditional ergod- 
icity of Z given the entire observation cj-field 3"^. This allows us to control 
the behavior of the conditional distributions P[Z„ £ • as n — oo. 
In contrast, the ergodic theory of nonlinear filters (section 4) is concerned 
with the "causal" setting where one considers the conditional distributions 
P[Xn,oo G • |3"o'n] as n — )• OO. The latter requires a one-sided version of our 
results where we only condition on 3"^ = 9^^^. Unfortunately, two-sided 
conditioning was essential to obtain a conditional zero-two law: if we had 
replaced 3"^ by 3"^ in section 3.1, for example, then the coupled measure Q 
would be nonstationary and the key Lemmas 3.2 and 3.3 would fail. 

We must therefore develop an additional technique to deduce one-sided 
conditional ergodicity results from their two-sided counterparts. To this end 
we prove the following result, which will suffice for our purposes. 

Proposition 3.9 (One-sided conditioning). Suppose that the stationary 
process {Z]f_,Y]f)j.^i is absolutely regular and nondegenerate. Then 

P[Zo,oo G • 1^^] ~ P[^o,oo G • P-a.s. 

Before we prove this result, let us use it to establish the key cr- field identity 
that will be needed in the ergodic theory of nonlinear filters (section 4). 

Corollary 3.10. Suppose that the stationary process {Zk,Yk)k£z is ab- 
solutely regular and nondegenerate. Then the following holds: 

Ci^l^y ^100 = 3^1 modP. 

ra>0 

Proof. Using a monotone class argument as in the proof of Lemma 3.8, 
j P'y' (A) Fyidz) = Py{A) for all A e 
holds for Y-a.e. y. Therefore, by Theorems 3.5 and 3.1, we find that 

for P-a.e. {z,y). As ||Py" — Py||y^ \\^y~ ~ Pj/IU^ as n — )■ oo, applying 
again Theorems 3.5 and 3.1 shows that is Pj,-trivial for Y-a.e. y. 

Fix a version of the regular conditional probability Py+ = P[ " 1 3"+]) where 
y+ = (lA,)fe>o- By Proposition 3.9, Py and Py^ are equivalent on 3'q ^ for 
Y-a.e. y. It follows that is also P.y_|_ -trivial for Y-a.e. y. In particular, 

PyM\^n,o.]^^PyM] P?/+-a-s. for Y-a.e. y 
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holds for any A £ 3'. By Lemma 3.11 below, this implies that 

P[A\jI V 3^1^] P-a.s. for every Ae3'. 

This evidently establishes the claim. □ 

We used above the following basic fact on repeated conditioning [46, p. 
95-96]. We omit the proof, which is essentially along the lines of Lemma 3.2. 

Lemma 3.11. Let Hi, H2, be Polish spaces, let H be a probability on 
Hi X H2 X H3 and Xi{xi, X2, X3) = xi, Si = (y{Xi\. Choose any versions of 
the regular conditional probabilities Rjjf^ = R[X2,X3 G • |Si] and Yix-i,X2 — 
R.[^3 G -ISi Vg2], and letYi^ = R[Xi G •]. Then for R^- a. e. xi G Hi, 

Rxi,X2(^) = Rxi[^3 G ^|S2] Rxi-a.s. for all A € S3. 

We now turn to the proof of Proposition 3.9. The essential difficulty is 
that we must show equivalence of two measures on an infinite time interval. 
The following lemma provides a simple tool for this purpose. 

Lemma 3.12. Let H be a Polish space, and let ^1,1/ be probability mea- 
sures on H^ . Denote by Xi : H^ — )• H the coordinate projections Xi{x) = Xi, 
and define the a-fields 9m,n = cr{Xm,n} for m < n. Lf we have 

n,oo 

] « z^[Xi,„„i G - IS 
for all n < 00, and if in addition 



then fi <^ ly on Si,cx)- 

Proof. Let ^„ = ^|g„^ and f„ = I'lc,,^^. Choose any A £ Q 

1,00 such 

that i^{A) = 0. Then z^[^|Sn,oo] = f„-a.s., and therefore /i[A|Sn,oo] = 
fin A i^„-a.s. by the first assumption. But using the second assumption 

K^) = 'E^^„AuMA\9n,oo]] + 'E^^„-^,„Au„[|-^[A\9n,oo]] < 1 " || /U„ A || ^:^^0, 

where we used ||/i - z^||g„_^ = 2(1 - A i/„||). Thus fx <. u on Si,cx>- □ 

We can now complete the proof of Proposition 3.9. 
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Proof of Proposition 3.9. By Lemma 3.6, it suffices to sliow tliat 
P[y„oo,-i e • 1^1] ~ P[^-oo,-i G • |^?+] P-a.s. 
Fix versions of tlie regular conditional probabilities 

Pxo,..=P[l^-oo,-iG-|9"+], 

Plo,cx,,Y_oc,-n = P[>"-oo -1 G • \3%,3'^oo~n]i 
PX0,oo,Y"-cx),-n = Pi^'-oo -1 G • |9"+,3"^oo,-n]- 

We will show that Pyq ~ Pxooo P-a.s. by applying Lemma 3.12. 
First, we claim that 

llPyo.oo - P^o,ooll:fr«,,_„ ^^^^^ for P-a.e. x = {z,y). 
Indeed, note that by the triangle inequality and Jensen's inequality 

||P[y_oo,-,n G • IS'X] - P[Y-oo,-m G • | J+]|| 

< ||P[- - Pb_._ + E[||P[- 13-+] - Fh-^.-J3-: 



Y] 



By Corollary 2.8, it suffices to show that the time-reversed process {X-k)k£Z 
is absolutely regular. But it is clear from Definition 2.7 that the absolute 
regularity property of a stationary sequence is invariant under time reversal. 
As we assumed absolute regularity of {Xk)kez, the claim follows. 
Next, we claim that for P-a.e. x = {z, y) 

Py0.oo,i^-oo,-m ~ Pa;o,oo,>'-oo.-m Pj/O.ools'y '^'^XQ^oolj^ "^.S. 

' ' ' ' ^ , ^ — oo, — m ' — oo, — m 

Indeed, as in the proof of Theorem 3.5, the nondegeneracy assumption yields 

P[F„oo,-i G • |3-roo,-^;n V ^X] ~ P[K-oo,-i G • I J^oo,-™ V 3-+] P-a.s. 

Thus there is a set if C (G x Ff x with P[(Xo,oo, I'-oo,-™) G = 1 
such that Pyo.ocy-oo,-™ ~ Pxo,oc,j/-oo,-m for ah (xo,oo, 2/-oo -m) G H, and 

j lH{xQ,oo,y-oo-m)'Pxo,^idy) = 1 for P-a.e. x. 

It follows that Pyo,oo,y-oo,-„, ~ Fxo,ao,Y-a.,-m hoMs P^^ Jr^Y^^^^-a.s., and 
thus a fortiori Pj/oooly"^ ^Pa:ocx)l5^ -a.s., for P-a.e. x. 

' — oo,— m ' — C!0, — m 

Finally, using Lemma 3.11 we conclude that for P-a.e. x = {z, y), we have 
verified the assumptions of Lemma 3.12 for the measures Pyg ^ and P^q oo- 
Thus we have shown Py^ ^ ~ Pxo oo P-a-s., and the proof is complete. □ 
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4. Ergodicity of the filter. Let (X„,y„)„>o be a Markov chain. We 
interpret (X„)„>o as the unobserved component of the model, while {Yn)n>o 
is the observable process. In this setting, there are two distinct levels on 
which the conditional ergodic theory of the model can be investigated. 

In the previous section, we investigated directly the ergodic properties of 
the unobserved process {Xn)n>o conditionally on the observations {Yn)n>o- 
This setting is of interest if the entire observation sequence {Yn)n>o is avail- 
able a priori. In contrast, it is often of interest to consider the setting of 
causal conditioning, where we wish to infer the current state Xn of the un- 
observed process given only the history of observations to date 3'o'n- 
object of central importance in this setting is the nonlinear filter 

Evidently, the filtering process (7r„)„>o is a measure-valued process that is 
adapted to the observation filtration S^q^- '^^^ S^^^ ^^^^ section is to 
investigate the stability and ergodic properties of the filter (7r„)„>o. 

In section 4.1 we will develop the basic setting and notation to be used 
throughout this section. In section 4.2 we develop a local stability result 
for the nonlinear filter, which is in essence the filtering counterpart of the 
local zero-two laws of section 2.1. In section 4.3 we apply the local stability 
result to develop a number of general stability and ergodicity results for the 
nonlinear filter that are applicable to infinite-dimensional models. Finally, 
in section 4.4 we will extend our results to the continuous time setting. 

The filter stability and ergodicity results developed in this section pro- 
vided the main motivation for the theory developed in this paper; their 
broad applicability will be illustrated in section 5 below. 



4.1. Setting and notation. 



4.1.1. The canonical setup. Throughout this section, we will consider 
the bivariate stochastic process {Xn,Yn)n£Z, where Xn takes values in the 
Polish space E and Yn takes values in the Polish space F. We realize this 
process on the canonical path space Q = O"^ x with Q-^ = and 
QX = F^, such that Xn{x,y) = x{n) and Yn{x,y) = y{n). Denote by 3" the 
Borel <T-field of and define Xm,n = iXk)m<k<n, Ym,n = {Yk)m<k<n, and 



for m <n. For simplicity of notation, we define the u-fields 

~ -^-00,00^ ~ •^-00,00-: ~ -^0,001 ~ -^0,00^ ~ JO,cx)- 
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Finally, we denote by Y the F -valued random variable {Yk)kez, and the 
canonical shift : ^2 — t- is defined as @{x, y){m) = [x{m + 1), y{m + 1)). 

For any Polish space Z, we denote by its Borel cj-field, and by CP(Z) 
the space of all probability measures on Z endowed with the weak conver- 
gence topology (thus 'y{Z) is again Polish). Let us recall that any probability 
kernel p : Z x 'B{Z') — t- [0, 1] may be equivalently viewed as a y(Z')-valued 
random variable z i— t- p{z, ■) on {Z,'B{Z)). For notational convenience, we 
will implicitly identify probability kernels and random probability measures 
in the sequel. The notation for total variation distance is as in section 2. 

4.1.2. The Markov model. The basic model of this section is defined by 
a Markov transition kernel P : E x F x 'B{E x F) — > [0, 1] on x F. Denote 
by P'^ the probability measure on 3"+ such that {Xn,Yn)n>o is a Markov 
chain with transition kernel P and initial law (-'^Oi ^) ~ /U G 'J'{E x F). For 
any (x, y) £ Ex F, we will denote for simplicity the law of the Markov chain 
started at the point mass {Xq^Yq) = {x,y) as P^'^ = P^^'^^y . 

We now impose the following standing assumption. 

Standing Assumption. The Markov transition kernel P admits an 
invariant probability measure A € ^{E x F), that is, XP = A. 

Let us emphasize that we do not rule out at this point the existence of 
more than one invariant probability; we simply fix one invariant probability 
A in what follows. Our results will be stated in terms of A. 

Note that by construction, (X„,l^,)„>o is a stationary Markov chain un- 
der P^. We can therefore naturally extend P^ to 3~ such that the two-sided 
process {Xn,Yn)n£Z is the stationary Markov chain with invariant probabil- 
ity A under P^. For simplicity, we will frequently write P = P^. 

4.1.3. The nonlinear filter. The Markov chain {Xn,Yn)n>o consists of 
two components: {Xn)n>o represents the unobservable component of the 
model, while {Yn)n>o represents the observable component. As {Xn)n>o is 
presumed to be observable, we are interested at time n in the conditional 
distribution given the observation history to date Yq, . . . ,Yn. To this end, 
we will introduce for every p G '?{E x F) the following random measures: 

K = P''[-\3-ln], 7ril = P^[XnG-\^lJ. 

The J'(-E)-valued process ('/r^)n>o is called the nonlinear filter started at 
/i. This is ultimately our main object of interest. However, we will find it 
convenient to investigate the full conditional distributions Tin- When /i = A 
is the invariant measure, we will write Hn = and T^n = T^n- 
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Remark 4.1. Note that Iln,Trn are 3"^,„-measurable kernels. That is, 
vr^J : X ■B{E) [0, 1] can be written as 7r^(A) = 7r^^[yo,n;^] for A G 'B{E). 
We win mostly suppress the dependence on Y^^n ^or notational convenience. 

4.2. A local stability result. The main tool that we will develop to in- 
vestigate the ergodic theory of nonlinear filters is a local stability result for 
the conditional distributions 11^. To this end, we fix in this subsection a 
countably generated local cj-field £^ C 'B{E) as in section 2.1, and define 

^^,n = ^m,nV V ^fc"He°), rn<n. 

m<k<n 

Let us emphasize that the localization pertains only to the unobserved com- 
ponent Xif.: it is essential for our results that the entire observation variable 
Yk is included in the local filtration 3^„. In practice, this typically implies 
that the unobserved process may be infinite-dimensional, but the observed 
process must be finite-dimensional (see section 5 for examples). 

As in section 3, we require two basic assumptions. The first assumption 
states that the model {Xn, Yn)n>o is locally ergodic. The second assumption 
provides a notion of nondegeneracy that is adapted to the present setting. 

Assumption 4.2 (Local ergodicity). The following holds: 
||P^''^ - P||yo ^ for A-a.e. {x, y) e E x F. 

Assumption 4.3 (Nondegeneracy). There exist Markov transition ker- 
nels Po : E X 'B{E) [0, 1] and Q : F x ■B(F) [0, 1] such that 

P{x, y, dx', dy') = g{x, y, x',y') Po{x, dx') Q{y, dy') 

for some strictly positive measurable function g : E x F x E x F ^]{),oo[. 

Note that Assumption 4.2 is characterized by Theorem 2.2, which yields 
a general tool to verify this assumption. Assumption 4.3 is easily verified in 
practice as it is stated directly in terms of the underlying model. 

The main result of this section is as follows. 

Theorem 4.4 (Local filter stability). Suppose that Assumptions and 
4-3 hold. Then for any initial probability fj, E 'J'{E x F) such that 

fi{E X ■) <^ X{E X ■) and ||n^ - noHa^o ^ ^^^^ P'^-a.s., 

we have 

\\U!i-Unyo ^^^0 P^-a.s. for any r en. 

n — r, oo 

If fi{E X •) ~ X{E X •), the convergence holds also P-a.s. 
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Remark 4.5. When interpreting this result, we must take care to ensure 
that the relevant quantities are well defined. Recall that Tin, as a regular 
conditional probability, is defined uniquely up to a P'^lfY -null set only. As 
part of the proof, we will in fact show that under the stated assumptions 
P'^ljy < Plj-y (resp. P'^lgrr ~ P|gry when fJ,{Ex-) ~ X{Ex-)). This ensures 
that n„ is P^-a.s. uniquely defined (resp. Hn is P-a.s. uniquely defined). 

If we strengthen Assumption 4.2 as in Theorem 2.1, we obtain: 

Corollary 4.6. Suppose that Assumption 4-3 holds and that 

||P^'^-P||50^ ^^=^0 for all {x,y) e E X F. 

Then for any ^ G y{E x F) such that ^{E x ■) <^ \{E x ■), we have 

||n^-n„|Lo Z^^o P/^-a.s. for any re fi. 

n — r. CO 

If fj,{E X •) ~ X(E X the convergence holds also P-a.s. 

Proof. The assumption clearly implies Assumption 4.2. Moreover, 

lin^-Rolljo < ||p^[-|yo] -Pbo + ||P[-|>o] -P||jo 



n,oo 



< Eq||p^O'^» -p||;,o_^|yo] + E[||p^°'^° -p^o^^iyo]. 

Thus IIIIq — Ilollgro ^ — ^ P'^-a.s. It remains to apply Theorem 4.4. □ 

We now turn to the proof of Theorem 4.4. We begin with a trivial conse- 
quence of the Bayes formula that we formulate for completeness. 

Lemma 4.7. Let fi, v he probability measures on a Polish space H , and 
let S ^ '^(H) be a a-field. If ^ ^ v, then /i[ • |S] ~ • |S] ^-a.s. and v-a.s. 

Proof. Let A = dn/dv. Then for any A G 'B{H), we have 

A 



1a 



E.[A|g] 



by the Bayes formula and using ^ ~ i^. As 'B(iJ) is countably generated, 
the /x-exceptional set can be chosen independent of ^ by a monotone class 
argument. Thus fj,[ - |S] ~ 19] ^-a.s., and also z^-a.s. as /i ~ z/. □ 

To proceed, we first prove a key measure-theoretic identity that arises 
from the conditional ergodic theory developed in section 3 above. 
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Lemma 4.8. Suppose that Assumptions 4-^ o.'f^d 4-3 hold. Then 
f]^lv:^,,^^ = 3^l modP. 

n>0 

Proof. Let {An)n£N be a countable generating class for 8,^, and define 
Zn = i{Xn) wliere l : E ^ {0,1}^ is given by l{x) = {lA„{x))n&- Tlien 
3m,n — •^m,n ^ 3"m,n by Construction. It now suffices to show that the sta- 
tionary process {Zn,Yn)nez satisfies the assumptions of Corollary 3.10. 

First, note that by Assumption 4.3 and the Markov property 

l|P[-|:J^-oo,o]-Pbo,^ <E[||P^-^'-P||jo^^|5^_oo,o]^^ P-a.s. 

Thus Corollary 2.8 yields absolute regularity of {Zn,Yn)nez- 

It remains to prove nondegeneracy (in the sense of Definition 3.4). To this 
end, we begin by noting that by the Markov property of (X„,y„)„g2 

P[Yl,n G • 1^-00,0 V J„+l,oo] = P^"'^°[yi,n G • \Xn+l,Yn+l]. 

Let R^'^ be the probability on 3"+ under which {X^, Yk)k>o is a Markov chain 
with transition kernel Po®Q started as {Xq, Yq) = (x, y). Then P^'^|g^o,n+i ~ 
-f^^'^ls'o n+i by Assumption 4.3. Therefore, by Lemma 4.7, 

P[yi,„ G • I J_oo,0 V 5-„+l,oo] ~ R^°'^"[n,n G • \Xn+l,Yn+l] P-a.S. 

But Xo^n+i and lo,n+i are independent under R^'J', so 

R^o,Fo[y^^^ G • \Xn+l,Yn+l] = RfYl.n G • \Yo,Yn+l] P-a.S. 

where R = J R^'*' X{dx, dy). Therefore, by Lemma 3.7, 

As this holds for any n G N, and using stationarity of P, it follows readily 
that the process {Zk,Yk)k£Z is nondegenerate (Definition 3.4). □ 

Armed with this result, we prove first a dominated stability lemma. 

Lemma 4.9. Suppose that Assumptions 4-^ (md 4-3 hold. Let CI be a 
probability measure on 3"+, and define S„ = Q[- 13"^^]. Suppose that 

Q\rrY.,rfo <C PUi-wnro for some m G N. 
Then — n^ll'TO — )■ Q-a.s. as n —)• oo for any r G N. 

n. — r on 
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Proof. Fix any r and m as in the statement of the lemma, and let 
n > m + r. By the Bayes formula, we obtain for any set A G 3^_r oo 



(we write E[X] = Ep[X] for simplicity). We therefore have for A G 3^_roo 

- n„(^) = jul EfA|j/T " ^ r "^"^ 

As 3^_j, oo is countably generated, the Q-exceptional set can be chosen in- 
dependent of A using a monotone class argument. It follows that 



n — r,oo 



E[A|J^V?^ 1 



E[A„|Jog 



where we have defined 

a„ = |e[A|j^v:?^_,,j-e[A|:j^j|. 

We now estimate 

E[A„|J^J < E[A:^| J^J + 2E[AlA>„| 

where 

= |E[A1a<u| J+ V 3^_r,oo] - E[A1a<„| Jo,n]|- 

By Lemma 4.8 and Hunt's lemma, we obtain E[A^|3"^„] — )• P -a.s. as 
n — )■ oo. Moreover, as E[A|3"^] > Q-a.s., it follows that 

2E[A1a>„|3'T] 
hmsup S„ - n„ gK) < Q-a.s. 

n-^OO ■^n-r,oc. E[A|5^4] 

Letting u — )■ oo completes the proof. □ 

The use the previous lemma, we will decompose P^ on 3"^ V 3^ ^ into an 
absolutely continuous component with respect to P (to which Lemma 4.9 
can be applied) and a remainder that is negligible as m — t- oo. The following 
lemma ensures that this can be done under the assumptions of Theorem 4.4. 
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Lemma 4.10. Suppose Assumption 4-3 holds. If f-i G ^{E x F) satisfies 
H{E X ■) ^ X{E X ■) and ||n^ - no||:fo ^ P^-a.s., 

then for every m G N, w;e can choose a set Cm. G 3"+ V 3^ ^ such that 
• n Cm] < P on V ^^oo and P^(C^) ^ as m ^ oo. 

Proof. By the Lebesgue decomposition theorem, we can choose for every 
m G N a set Cm G crjyo} V 3^ ^ such that the fohowing holds: 

P^[ • n Cm] « P on a{Yo} V 5^,,^ and P(C™) = 1. 

Now note that no(C^) = P-a.s., and therefore also P^-a.s. as we have 
assumed that fi{E x ■) <^ \{E x •). Thus we obtain P'^-a.s. 

n^,iC'j = ni^ic'j - UoiC'm) < \K - noL{yo}v50,^ = l|n^ - nobc, ^. 

Taking the expectation with respect to P^ and letting m — t- oo, it follows 
using dominated convergence that P^(C^) — )■ as m — )■ oo. 

It remains to show that P'^ [ • fl Cm] ^ P on the larger u-field 3"^ V 3^ ^ . 
To this end, we will establish below the following claim: P'^[- n Cm]-a.s. 

P^[Y,,m-i G • W{Yo} V J^^J ~ P[yi,™-i G • \a{Yo} V 5^,^]. 

Let us first complete the proof assuming the claim. Let A G 3"^ V 3^^^ 
such that P(^) = 0. Then P[A\a{Yo}y = P-a.s., and therefore also 

P''[- n Cm,]-a.s. But then the claim implies that P''[^|(T{yo} V 3^^^] = 
P'"! • n Cm]-a.s., which yields P^(A n Cm) = as required. 

We now proceed to prove the claim. Let Rm be the probability on 3"-|_ 
under which {Xk,Yk)k>o is an inhomogeneous Markov chain with initial law 
{Xq,Yq) ~ fi, whose transition kernel is given by Pq ® Q up to time m and 
by P after time m. Assumption 4.3 evidently implies that Rm ~ P'^, so 

P^[Y,,m-i G • \c7{Yo} V :?^,^] ~ R^;[yi,™-i G • \a{Yo} V 5^,^] P^-a.s. 

by Lemma 4.7. Now note that 

Rm[yL,m-l G • IYq, Xm,Ym] = R(^[ll,m-1 G ' 1^0; ^m], 

as Yi^m and Xm are conditionally independent given Yq under R^. There- 
fore, we obtain using the Markov property and tower property 

n^Y,,m-i e • W{Yo} V 3t,oo] = R(;[n,m-i G • \Yo,Ym] P^-a.s. 
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Proceeding in exactly the same manner for P, we obtain 

P^[Yi,m~i G • \a{Yo} V J ~ R^^[yi,™-i G • |lo,^m] P^l- n C™]-a.s., 
P^i,™-! G • W{Yo} V J ~ R^[yi,™-i G • \Yo,Ym\ P^l- n C^]-a.s. 

It remains to note that Rm[>i,m-i G -lyo,^™.] = Rm[^i,m-i G -lloj^m] 
P'^[- n Cm]-a.s., as lo,m is (the initial segment of) a Markov chain with 
transition kernel Q under both Rm and R^. Thus the proof is complete. □ 

The following corollary will be used a number of times. 

Corollary 4.11. Suppose Assumption 4-3 holds. If fi satisfies 
^i{E X •) < KE X •) and \\U^ - Uo\yo^ ^ P^-a.s., 

then P^lgry <C Pigry . // also X{E x •) ~ fi{E x ■), then P^\^y ~ P\^y. 

Proof. Define the sets Cm as in Lemma 4.10, and choose any A G 3"^ 
with P{A) = 0. Then P'^(A) = P'^(^nC^) < P''{C^) ^ as m ^ oo. This 
yields the first claim. On the other hand, note that the proof of Lemma 4.10 
does not use the invariance of A. Thus we may exchange the roles of fj, and A 
in Lemma 4.10 to obtain also P|gry ^ P^l^^ when A(£^ x •) ~ fi{E x •). □ 

We can now complete the proof of Theorem 4.4. 

Proof of Theorem 4.4. Fix fi as in the statement of the Theorem, and 
define the corresponding sets Cm as in Lemma 4.10. Let Pm = P'^[ - \Cm] 
and Pm"^ = P^[ • |C^]. By the Bayes formula, we can write for any A G 3""*" 

P'^[^|3"^„] = P^[^|3"^„] P'^[Cm| + P^^[A|IJ'^„] P'^[C^|3"^„] P'^-a.s. 
In particular, if we define = Pm[- \ 3^on\i '^^ write 

mA) - n„(A)| < - n„(A)| n^(c^) + u'^.ic'j p^-a.s. 

As 3"+ is countably generated, the P^-exceptional set can be chosen inde- 
pendent of A using a monotone class argument. We therefore obtain 



limsup ||n^ -n. 



n jO 

n — r, oo 



limsup IIS™ — n^ll^o 



+ P^(C^ 



But the first term on the right vanishes by Lemma 4.9. Therefore, using that 
P^(C£,) ^ as m ^ oo, we find that llW^ - UJlrfO P/^-a.s. 

n — r, oc 

This completes the proof when X{Ex •) <^ i_i(Ex •). To conclude, note that 
P-a.s. convergence follows by Corollary 4.11 when X{E x •) ~ fi(E x •). □ 
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4.3. Filter stability and ergodicity. Using the local stability Theorem 4.4 
we can now proceed to obtain filter stability results that are applicable to 
infinite-dimensional or weak-* ergodic models, in analogy with the ergodic 
results obtained in section 2. While many variations on these results are 
possible, we give two representative results that suffice in all the examples 
that will be given in section 5 below. Beside stability, we will also consider 
following Kunita [21] the ergodic properties of the filtering process (vr„)„>o 
when it is considered as a measure- valued Markov process. 

4.3.1. Filter stability and local mixing. In this short subsection, we as- 
sume that the state space E of the unobserved process is contained in a 
countable product E C Yl^^jE^, where each E^ is Polish. We are therefore 
in the local mixing setting of section 2.2. In the present section, we define 

3"m,n = '^{^m,n,Ym,n} for J C /, m < n, 

that is, we include the observations in the local filtration. We also denote 
by E''^ ^ 'B{E) the cylinder cj-field generated by the coordinates in J C /. 
The bivariate Markov chain {Xn,Yn)n>o is said to be locally mixing if 

IIP"^'^ -PIIjj^ ^^^^ for ah {x,y) £ Ex F and J C /, |J| < oo. 

It is easily seen that this coincides with the notion introduced in section 2.2. 
The following filter stability result follows trivially from Corollary 4.6. 

Corollary 4.12. Suppose that the Markov chain {Xn,Yn)n>o is locally 
mixing and that Assumption 4-3 holds. Then for any /x G 'J'{E x F) such 
that fi{E X •) ^ \{E X •), and for any r G N, we have 

||n(^ - n„|Lj P^-a.s. for all J CI, \J\ < oo. 

n — r,oo 

In particular, the filter is stable in the sense 

Ikn ~ ^nllgJ "^°°> P^-a.s. for all J CI, | J| < oo. 
If fi{E X •) ~ X(E X the convergence holds also P-a.s. 

4.3.2. Filter stability and asymptotic coupling. The goal of this section 
is to develop a filter stability counterpart of the weak-* ergodic theorem in 
section 2.4. For sake of transparency, we will restrict attention to a special 
class of bivariate Markov chains, known as hidden Markov models, that arise 
in many settings (cf. section 5). While our method is certainly also applicable 
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in more general situations, the hidden Markov assumption will allow us to 
state concrete and easily verifiable conditions for weak-* filter stability. 

A hidden Markov model is a bivariate Markov chain (Xk,Yk)k>o (in the 
Polish state space E x F) whose transition kernel P factorizes as 

P(x, y, dx\ dy') = Poix, dx') ^{x' , dy') 

for transition kernels Pq: Ex'5{E) ^ [0, 1] and $ : £^ x 'B(F) [0, 1]. The 
special feature of such models is that the unobserved process {Xk)k>o is a 
Markov chain in its own right, and the observations (lfc)fc>o are condition- 
ally independent given {Xk)k>o- This is a common scenario when iXk)k>o 
represent noisy observations of an underlying Markov chain {Xk)k>o- In this 
setting, it is natural to consider just initial conditions for Xq, rather than 
for the pair {Xq,Yq). We therefore define = P^^'^'J'C^') iox x £ E and 
p/^ = JP^fj,(dx) for n G "PiE), as well as the corresponding filters Trn,Iin- 
We will assume that Pq admits an invariant probability A G J'(-E'), so that 
A = A (X" <I> is invariant for {Xn,Yn)n>o (this entails no loss of generality if 
we assume, as we do, that {Xn,Yn)n>o admits an invariant probability). 

A hidden Markov model is called nondegenerate if the observation kernel 
$ admits a positive density with respect to some reference measure (p on F: 



^{x,dy) = g{x,y)ip{dy), 



g{x, y) > for ah (x, y) £ E x F. 



Evidently, nondegeneracy of the hidden Markov model corresponds to the 
validity of Assumption 4.3 for the bivariate Markov chain (Xk, Yk)k>o- 

We can now state our weak-* stability result for the filter in the hidden 
Markov model setting; compare with the weak-* ergodic Theorem 2.10. In 
the following, we fix a complete metric d for the Polish space E. To allow 
for the case that the observation law is discontinuous with respect to d (see 
section 5 for examples), we introduce an auxiliary quantity d that dominates 
the metric d. Let us note that it is not necessary for d to be a metric. 

Theorem 4.13 (Weak-* filter stability). Let {Xk,Yk)k>o be a nondegen- 
erate hidden Markov model that admits an invariant probability X, and let 
d{x, y) > d{x, y) for all x,y £ E. Suppose that the following hold: 

a. (Asymptotic coupling) There exists a > such that 



^x,x'eE 3QGe(P^,P^') s.t. Q 



J2d{Xn,X'J^ <oo 



n=l 



> a. 
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b. (Hellinger-Lipschitz observations) There exists C < oo such that 

\^)- V¥^)]\(iy) < CA^..V for .u e E. 

Then the filter is stable in the sense that 

In particular, we obtain 

IKn ~ ^nllBL "'~^°°> in 'P'^ -probability for all £ 7{E). 

Proof. The proof is similar to that of Theorem 2.10. Let (^n.)n>o be 
an i.i.d. sequence of standard Gaussian random variables independent of 
{Xn,Yn)n>o^ SO that we may consider in the following the extended Markov 
chain y„)„>o. Fix / G Lip(£;), and define Fn = f{Xn) + in- Condi- 

tionally on 3"^, the process (-Fn,^n)n>o is an independent sequence with 

P^[(F„,y„)„>fc G ^|3-f ] = R^'=>-(A) : = 

~ -(r.„-/(X„))V2 



/ lA(r, y) n j= drn g{Xn, yn) V>{dyn), A G S(M X F) 

J n=k ^27r 

for all /i G '^{E). We can now estimate as in the proof of Lemma 2.11 

oo 

llR-o.oo _ RxU||2 < ^ [2{/(x„) - + 8Cd>„,<) 

n=0 

oo 

< {%C + 2)Y,d{xn,x'^)\ 



n=0 



where we have used that 1 — W^i^ — Pn) < YlnPn when < p„ < 1 for all 
n. Therefore, proceeding exactly as in the proof of Theorem 2.10, we find 
that for every x, x' G E, there exists n > 1 such that we have 

||P ' [-^n,oo) ^^,oo 6 ■ ] -P ' [-^fi,oo5 ^^,oo £ '111 ^2 Q. 

Now note that the law P^ [i^n, oo j ^,oo G ■] does not change if we condition 
additionally on ^0)^o (as ^Oi^o are independent of -^^1,001^1,00)^1,00 under 
P^'). We can therefore apply Theorem 2.1 to conclude that 

||P'^[Fn,oo,K„,oo G •] -P"[Fr^,oo,l;^,oo G - HI for all ^, 1/ G 7{E). 
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Moreover, note that P'^[yo G •] ~ for all /x G ?'(£'). Thus we can apply 
Corollary 4.6 to conclude (here ^ G iP(]R) is the standard Gaussian measure) 

\\<r^ * C - 7r„/"^ * ell ^^^^ P^-a.s. and P-a.s. 

Applying the argument in the proof of Theorem 2.10 pathwise, we obtain 

Kif) - ^n(/)| P^'-a.s. and P-a.s. 

Thus by the triangle inequality, we have 

Kif) - <if)\ < \<if) - ^nif)\ + Kif) - vr„(/)| P-a.s. 

Finally, note that the assumptions of Corollary 4.11 are satisfied for any 
initial measure, so that P'^'lg-y ^ P|gry for any 7 € ^(-E). Thus the above 
P-a.s. convergence also holds P'^-a.s., which yields the first conclusion. 

To obtain the second conclusion, we argue as in the proof of Theorem 
2.12. Fix e > 0. Let K C E he a compact set such that \{K) > 1 — e, and 
define x{x) = (1 ~ e~^d{x, K))^. Following the argument used in the proof 
of Theorem 2.12, we can find functions /i, . . . , G Lip(£') such that 

||vr^^ - tt^IIbl < max - Mhx)\ + \<{x) - Mx)\ + 2nn{K') + 6e 

1=1,. ..,k 

for all n > 0. Taking the expectation and letting n — )■ 00, we obtain 

limsupE[||7r^ - vrnllsL] < 8e. 

As e > is arbitrary, this implies that 

IK^ — ^nllBL "~^°°> in P-probability. 

But P'^lg^y ^ P|g^y) so the convergence is also in P'^-probability. Applying 
the triangle inequality and dominated convergence completes the proof. □ 

Remark 4.14. In Theorem 4.13, we obtain a.s. stability of the filter for 
individual Lipschitz functions, but only stability in probability for the || • ||bl 
norm. It is not clear whether the latter could be improved to a.s. convergence 
(except when E is compact, in which case the compactness argument used 
in the proof above directly yields a.s. convergence). The problem is that we 
do not know whether the null set in the a.s. stability result can be made 
independent of the choice of Lipschitz function; if this were the case, the 
method used in Theorem 2.10 could be used to obtain a.s. convergence. 
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4.3.3. Ergodicity of the filter. We developed above a number of filter 
stability results that ensure convergence of conditional expectations of the 
form |E^[/(X„)| -E[/(X„)| ^ 0. This can evidently be viewed as 
a natural conditional counterpart to the classical ergodic theory of Markov 
chains, which ensures that |E'^[/(X„)] — E[/(X„)]| — )• 0. In this section, 
following Kunita [21], we develop a different ergodic property of the filter. 

It is well known — and a simple exercise using the Bayes formula — that 
the filter vr^ can be computed in a recursive fashion. In particular, under 
the nondegeneracy Assumption 4.3, we have vr^.i = U{Trn,Yn,Yn+i) with 



E''[/7«^i,y„+i)|JoV] = E'^[//(c/«,y„,y„+i),y„+i)|JoV] = r^«>^n), 

where the kernel T : T{E) x F x 'B{T{E) x F) ^ [0, 1] is given by 



Thus we see that the process (vr^, l^)n>o is itself a {7{E) x F)-valued Markov 
chain under with transition kernel V. In the hidden Markov model setting 
of section 4.3.2, V{i',y,A) does not depend on y, so that in this special case 
even the filter ('7r^)n>o itself is a IP(i?)-valued Markov chain. 

In view of this Markov property of the filter, it is now natural to ask about 
the ergodic properties of the filtering process itself. Generally speaking, we 
would like to know whether the ergodic properties of the underlying Markov 
chain l^).„>o with transition kernel P are "lifted" to the measure- valued 
Markov chain (vr^J,l^)„>o with transition kernel f. Following the seminal 
work of Kunita [21], such questions have been considered by a number of 
authors [37, 3, 5, 43, 39]. We will focus here on the question of unique 
ergodicity, where the following result (essentially due to Kunita) is known. 

Theorem 4.15. Suppose that Assumption 4-3 holds, and that the tran- 
sition kernel P admits a unique invariant measure A € x F). Then the 
transition kernel V admits a unique invariant measure A G 'J'{'J'{E) x F) iff 



U{u,y,y'){A) 



J 1a{x') g{x, y, x',y') Po{x, dx') v{dx) 
J g{x, y, x', y') Pq{x, dx') u{dx) 



Let H : 'S'{E) x — )• R be a bounded measurable function. Then 




■X 

— oo,— n 



= J: 



■Y 



mod P. 



n>0 
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We refer to [43] for a full proof in the hidden Markov model setting, which 
is easily adapted to the more general setting considered here (sufficiency is 
also shown in our setting in the proof of [39, Theorem 2.12]). 

To prove unique ergodicity of the filter, we must therefore establish the 
measure-theoretic identity in Theorem 4.15. The goal of this section is to 
accomplish this task under the same assumptions we have used for filter 
stability: local mixing or asymptotic couplings. In fact, slightly weaker forms 
of the assumptions of Corollary 4.12 or Theorem 4.13 will suffice. We refer 
to sections 4.3.1 and 4.3.2 for the notation used in the following results. 

Theorem 4.16. Suppose that Assumption 4-3 holds. Let {Xn,Yn)n>o 
with E C J7jgj ii^* he uniquely ergodic and a.e. locally mixing in the sense 

||P^'^ — P||grj ^ "^°°> for X-a.e. {x,y) and all J O I, | J| < oo. 
Then the filter transition kernel V admits a unique invariant measure. 

Theorem 4.17. Let (-'^fc,lfc)fc>o be a hidden Markov model that is non- 
degenerate and that admits a unique invariant probability A. Moreover, let 
d{x, y) > d{x, y) for all x,y £ E, and suppose that the following hold: 

a. (Asymptotic coupling) For X X-a.e. {x,x') £ E x E, 



3QGe(P^,P^) such that Q 



n=l 



> 0. 



b. (Hellinger-Lipschitz observations) There exists C < oo such that 

/{«- < for an e E. 

Then the filter transition kernel V admits a unique invariant measure. 

The remainder of this section is devoted to the proof of these results. We 
must begin by obtaining a local result in the setting of section 4.2. To this 
end, we will need the following structure result for the invariant measure A. 

Lemma 4.18. Suppose that Assumptions 4. 2 and 4-3 hold. Then the in- 
variant measure X satisfies X{dx,dy) ~ X{dx x F) C?) X{E x dy). 

Proof. Assumption 4.2 and Jensen's inequality yield P-a.s. 
E[||P[y„ G • \Yo] - X{E X .)!!] < E[||P^O'^o[K„ e .] - X{E x .)||] 0. 
The result follows by applying [39, Prop. 3.3] to the process {Yn, Xn)n>o- D 
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We can now prove the following local result. 

Corollary 4.19. Suppose that Assumptions and 4-3 hold. Then 



A 



■X 

■oo,—n 

n>0 



PiAlS'^] P-a.s. for every 



Proof. By Lemma 4.18, we have X{dx, dy) = h{x, y) X{dx x F) \{E x dy) 
for a strictly positive measurable function h. Define the kernel 

. . .^ ^ I 'i-A{y)h{x,y) \{E X dy) 
' jh{x,y)\{Exdy) ' 

By the Bayes formula Axq = P[^o £ • l^o]- Assumption 4.2, disintegration, 
and Jensen's inequality yield a set H ^ E with \[H x F) = 1 such that 

E^.^A.yipx.Yo _p||^^^] Q and E^-^^-[||no-P||jo^] ^^^0 



for all xeH. But note that Hq"^^" = P^-'^^o holds P-^-^-^^-a.s. Thus 

~ \{E X •) and ||n^-^^- - Hobo ^ P^^^^^-a.s. 

for every x ^ H hy the definition of A^. and the triangle inequality (we 
have used that ||no"^^" - BoH gro is pointwise decreasing to establish a.s. 
convergence). Therefore, by Theorem 4.4, we obtain for every x £ H 

||n^.®A, _ n„||^ Q p5.®A,.a.s. for any r G N. 



n — r.cxo 



Now note that n^^'"®^""" (^) = P[A|(7Xo V for ah A G J+, for example, 
by Lemma 3.11. It follows that for any yl G 3^^^, we have the convergence 



But the Markov property of {Xn,Yn)nez yields B[1a o e"-''|aXo V 3"^„] 
E[l^ o 1 3"'5oo 3"^oo,n]- Therefore, using stationarity, we obtain 



E[|E[1^ o e~^\r V 3-^^,_J - E[l^ o G-n9-^„,o]|] 0. 

The lemma now follows by the martingale convergence theorem for A G 
3^j. . As r is arbitrary, a monotone class argument concludes the proof. □ 

The proof of Theorem 4.16 is now essentially trivial. 
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Proof of Theorem 4.16. By Corollary 4.19, we have 



A 



X 



n>0 



P[^|3"r] P-a.s. for every A £ 



whenever J C I, | J| < oo. A monotone class argument yields the conclusion 
for all j4 G 9". Thus the c- field identity of Theorem 4.15 holds. □ 

We now turn to the proof of Theorem 4.17. 

Proof of Theorem 4.17. Proceeding precisely as in the proof of The- 
orem 4.13 (and adopting the same notation as is used there), we find that 
for A ® A-a.e. (x, x') £ E x E, there exists n > 1 such that 

ll-P [-^n.,005 ^^,00 £ ■ ] F* [Pn,oo7 ^n,oo ^ ' ] II ^ 2. 

From Theorem 2.2, it follows that 

||P"'n^n,oo,i;,oo G •] - P[Fn,oo,Yn,oo ^ ■]\\ A-a.c. {x,y) e E X F. 

Applying Corollary 4.19, we find that 



A 



oo.—n 



n>0 



P[A|3^^] P-a.s. for every A G J; 



FY 



where 3"™;! = (y{Xm,n,im.,n] and 3"m,n = CF{Fjn,n,ym,n]- In particular, if 

G = g{f{X_m) + i-ra, f{Xm) + Y-m, ■ ■ ■ ,Ym) 

for some bounded continuous function g : 1^^"^+^ x i^'^m+i _^ then 



E 



G 



n>0 



E[G|J^] P-a.s. 



as cr{,^_oo,-n} is independent of V 3"^ V 0"{^_m^m} for n > m. Now 
note that nothing in the proof relied on the fact that are Gaussian with 
unit variance; we can replace by e^k for any e > and attain the same 
conclusion. Letting e — )• 0, we find that for any / G Lip(£'), m > 0, and 
bounded continuous g : M^^^^ x _^ ^j^g above identity holds for 



G — g{f{X^jji), • • • 5 f{Xm),Y-mi • • • 5 ^m)- 
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The remainder of the proof is a routine approximation argument. As E is 
Pohsh, we can choose a countable dense subset E' C E. Then the countable 
family of open balls {B{x,6) : x G E',6 £ Q+} (where B{x,S) is the open 
ball with center x and radius 5) generate the Borel a- field 'B(E). Arrange 
these open balls arbitrarily as a sequence {Bk)k>i, and define the functions 

= E 4 w = E ^"'"'y ^ e E). 

k=l k=l 

Then if. is bounded and Lipschitz for every r, 6, and if. ^ t as 6 I 0, r ^ oo. 
Choosing f = if. and taking limits, we obtain the above identity for 

G = g{i{X-rn)-, ■ ■ ■ 1 A^ra)-, Y-m-, • ■ • , ^m) 

for any m > and bounded continuous g : M^"^"''^ x i?"2m+i _i. ]^ ^ monotone 
class argument shows that we may choose G to be any bounded a{i{Xk),Yk : 
k £ Z}-measurable function. But = so the proof is complete. □ 

4.4. Continuous time. Up to this point we have considered only discrete- 
time processes and Markov chains. However, continuous time processes are 
of equal interest in many applications: indeed, most of the examples that we 
will consider in section 5 will be in continuous time. The goal of this section 
is to extend our main filter stability results to the continuous time setting. 

In principle, we can view continuous time processes as a special case of the 
discrete time setting. If (xf, yt)t>o is a continuous time Markov process with 
cadlag paths, then we can define the associated discrete-time Markov chain 
{Xn,Yn)n>o with valucs in the Skorokhod space D([Q, 1]; E x F) by setting 
Xn = {xt)te[n,n+i] and Yn = {yt)te[n,n+i]- When we are deahng with the 
(unconditional) ergodic theory of {xt,yt)t>o, we can obtain continuous-time 
ergodic results directly from the corresponding results for the discrete-time 
chain (X„,l^)„>o. However, in the conditional setting, two issues arise. 

First, note that in the unconditional setting, the marginal law Plxf S • ] for 
t G [n, n+1] is a coordinate projection of P[X„ € • ]. However, this is not true 
for the filter: the projection of P[X„ e • gives F[xt G ■ I (2/s)se[o,n+i]]) 

not the continuous time filter irt = P[xt £ ■ \{ys)s£[o,t]]- We must therefore 
get rid of the additional observation segment {ys)se[t,n+i] that appears in 
the projection. This is precisely what will be done in this section. 

Second, in continuous time, numerous subtleties arise in defining the fil- 
tering process {Trt)t>o as a stochastic process with sufficiently regular sample 
paths. Such issues would have to be dealt with carefully if we wanted to ob- 
tain, for example, almost sure filter stability results in the continuous time 
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setting. The structure of nonlinear filters in continuous time is a classical 
topic in stochastic analysis (see, for example, [25, 47, 22]) that provides the 
necessary tools to address such problems. However, in the present setting, 
such regularity issues are purely technical in nature and do not introduce 
any new ideas in the ergodic theory of nonlinear filters. We therefore choose 
to circumvent these issues by considering only stability in probability in the 
continuous time setting, in which case regularity issues can be avoided. 

A final issue that arises in the continuous time setting is that, unlike in 
discrete time, one must make a distinction between general bivariate Markov 
processes and hidden Markov processes, as we will presently explain. 

Recall that a discrete-time hidden Markov model is defined by the fact 
that {Xn)n>o is itself Markov and (Yn)n>o are conditionally independent 
given (Xn)n>o- In continuous time, we cannot assign a (conditionally) inde- 
pendent random variable to every time t G M+. Instead, we consider an inte- 
grated form of the observations where {xt)t>o is a Markov process and {yt)t>o 
has conditionally independent increments given {xt)t>o- This is known as a 
Markov additive process [6], and constitutes the natural continuous-time 
counterpart to a hidden Markov model [47]. For example, the most common 
observation model in continuous time is the "white noise" model [25] 



where (Wt)t>o is a Brownian motion independent of {xt)t>o- Formally, dyt/dt 
represents the observation of h{xt) corrupted by white noise, but the in- 
tegrated form is used to define a mathematically sensible model. In this 
example, the pair {xt,yt)t>o is evidently a Markov additive process. 

In principle, a continuous-time hidden Markov process is a special case 
of a bivariate Markov process as in the discrete time setting. Unfortunately, 
as yt is an additive process, it cannot be positive recurrent except in trivial 
cases, so the pair {xt,yt)t>o does not admit an invariant probability. We must 
therefore take care to utilize explicitly the fact that it is the increments of 
yt, and not yt itself, that will be stationary under the invariant distribution. 
This does not introduce any complications into our theory: both the bivariate 
Markov setting and the Markov additive setting can be treated in exactly 
the same manner. However, two distinct sets of notation are required for 
these two settings. In order to avoid notational confusion, we will develop 
our continuous time results below in the hidden Markov process setting only 
(all our examples in section 5 will be of this form) . The same approach can 
however be adapted to the bivariate Markov setting with minimal effort. 



yt= h{xs) ds + Wt 
Jo 
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4.4.1. The continuous time setting. In the remainder of this section, we 
consider a continuous-time process {xt,yt)t>o with cadlag paths, where xt 
takes values in a Polish space E and yt takes values in a Polish topological 
vector space F. We realize this process on the canonical path space = 
D{Rj^]E X F) endowed with its Borel cr-field 3", such that xt{i,vi) = ^{t) 
and yt{(,,v) — vi'^)- We define for s < t the D{[0,t — s];E)-valued random 
variable Xs,t = {xr)r£[s,t] the cj-field 3'^^ = a{xs^t}- Moreover, we define 
the D{[0,t — s];F)-valued random variable y^,* and corresponding cr-fields 

ys,t = iVr - ys)r<^ls,t], 3^s,t = ^^iVsA^ ^s,t = 3"s,t V 3^^^. 

The shift 9* : ^ is defined as e*(^,7/)(s) = {^{s + t),r]{s + t)-r]{t)). Let 
us emphasize that the observation segment ys^t and the shift G* are defined 
differently than in the discrete time setting: the present choice accounts for 
the additivity of the observations, which we introduce next. 

In the continuous time setting, we will assume that the canonical process 
is a hidden Markov process or Markov additive process: that is, {xt,yt)t>o is 
a time-homogeneous Markov process such that E[f{xt,yt — yo)\xo,yo] does 
not depend on yo for any bounded measurable function /. It is not difficult 
to verify that this assumption corresponds to the following two properties: 
the process {xt)t>o is Markov in its own right, and the process (yt)t>o has 
conditionally independent increments given {xt)t>o (see, for example, [6]). 

In the following, we define the probability P^ on 3'o,oo as the law of the 
Markov additive process {xt,yt — yo)t>o started at xq = x £ E, and let 
pM = J ^^^(^dx) for G "PiE). We will assume the existence of an invariant 
probability A € J'(-E') so that P^ is invariant under the shift G* for all t > 0. 
We define P = P'^, and introduce the continuous-time nonlinear filters 

7r^ = P^[xte-\3t,t], 7rt = P[xte-\3^,,]. 

As we will consider convergence in probability only, we will not worry about 
the regularity of function of t (that is, for each i > 0, we may choose 

any version of the above regular conditional probabilities) . 

The Markov additive process {xt,yt)t>o is said to be nondegenerate if for 
every 6 G ]0,oo[, there exists a reference measure ips on D{[0, 6]; F) and a 
strictly positive function gs : D{[0, 5]; E x F) ^ ]0, oo[ such that 

P^[yt,t+5 e ^|^o,oo] = j U{v)95{xt,t+5,v)^s{drj) P^-a.s. 

for a\l t > 0, A £ 'B{D{[0,5]; F)) and z £ E. This assumption is the direct 
counterpart of nondegeneracy for discrete time hidden Markov models. 
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4.4.2. Local mixing in continuous time. The aim of this section is to 
obtain a continuous-time version of Corollary 4.12 (in the setting of hidden 
Markov processes). To this end, we assume that the state space E of the 
unobserved process xt is contained in a countable product E C W^^j E^ , 
where each E^ is Polish. Let xj be the projection of xt on Hie j ^'^'^ 

= for J C /, s< t. 

Let S.'^ C 23 (i?) be the cylinder u-field generated by the coordinates J ^ I. 

Theorem 4.20 (Continuous local mixing filter stability). // the Markov 
additive process {xt,yt)t>o is nondegenerate and locally mixing in the sense 

llP^-PlLj for allx e E and J C I, \J\ <oo, 

then the filter is stable in the sense that 

\\tt^ — TTfW^j *'^°°> in -probability for all J I, | J| < oo 

for every fi,i^,j£ '^{E)- 

We will reduce the proof to the discrete time case. The key to this reduc- 
tion is the following lemma, essentially due to Blackwell and Dubins [2]. 

Lemma 4.21. Let R and R' be probabilities on D{R+,F). Let rt the 
coordinate process o/D(IR_|_,F) and St = cr{rs : s € [0,t]}. 7/R ^ R', then 

||R[-|gt] -R'[-|gt]|| inn-probability. 

Proof. Let A = dR/dR'. Then the Bayes formula yields 

A-ER,[A|gt: 



R[^|gt]-R'[^|gt] =Er, 



1a 



R-a.s. 



ER.[A|gt] 

As D{Wj^,F) is countably generated, it follows that 

||R[-|gt]-R'[-|gt]||<^|^ R-a.s., A, = |A-ER,[A|g,]|. 

But note that ER/[A|g(] — A and therefore — > in R'-probability by 
the martingale convergence theorem (a right-continuous modification of the 
martingale is not needed for convergence in probability), while A > R-a.s. 
The remaining steps of the proof follow the proof of Lemma 4.9. □ 
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We now turn to the proof of Theorem 4.20. 

Proof of Theorem 4.20. Let E = D{[0,1]; E), F = D{[0,1]; F), Xn = 
Xn,n+i, and Yn = yn,n+i- Then (X„,y„)„>o is a nondegenerate hidden 
Markov model in ExF under P'^ for every £ "PiE): in particular, {Xn)n>o 
is a Markov chain with initial measure /i and transition kernel Pq given by 

fiidO = P^ixo,! G dC], Po{C,de) = P«(^)[xo,i G de], 

while, by the nondegeneracy assumption, the observation kernel ^ is 

^{C,d7]) = gi{C,r])ipi{dr]) 

(so that, as in section 4.3.2, {Xn, Yn)n>o is the Markov chain with transition 
kernel P{^,'n,d^' ,dr]') = Po{^,d^')^{^' ,dr]')). Moreover, A = P[3;o,i G • ] is 
an invariant probability for the discrete-time model {Xn,Yn)n>o- 

We can now apply Corollary 4.12 to the discrete-time model {Xn, Yn)n>o- 
Indeed, we can decompose E C Hie/ with E^ = D{[0,1]; E^), and our 
local mixing assumption directly implies that the discrete model Yn)n>o 
is locally mixing with respect to this decomposition. It follows that 

l|P^[^n,n+l G • |9^^„+i] — P[a^n,n+i G • |3^,„+i]|| > P-a.s. 

for ah J C /, I J| < oo and fi £ 'J'{E) by Corollary 4.12, while Corollary 4.11 
yields the equivalence P'^ lj'j ~ • The latter implies that 

||P^[yo,oo G ■ - P[yo,oo G • ^ in P-probability 

by Lemma 4.21, which we now proceed to exploit. 
Let t G [n,n + 1] for some n G N. Then we have 

vrf = B'^[P'^[xt G • \3^o,n-,i]\n,t], = E[P[x, e • \3^o,n-,M,t]- 

We can therefore estimate 

\K - nhj < ||Pnyo,oo G • 13^^ - P[yo,oo G • 

+ E[||P^[x, e • |5^^„^J - P[x, E • \3^o,n+,]h-'\%]- 

It follows that — TTtllgj — ?• as t — )• oo in P-probability. As /i was 
arbitrary, the proof is easily completed using the triangle inequality and the 
equivalence of all observation laws to Plnri/ as established above. □ 

'■^0,00 
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Remark 4.22. As we have seen above, deducing filter stability in contin- 
uous time from our discrete time results requires some additional arguments 
(a slightly longer argument will be used below in the setting of asymptotic 
coupling). Let us therefore note, for sake of completeness, that the corre- 
sponding results on the ergodicity of the filtering process {'7Tt)t>o as in sec- 
tion 4.3.3 follow immediately from their discrete-time counterparts: in fact, 
uniqueness of the invariant measure of any continuous-time Markov process 
{'^t)t>o is evidently implied by uniqueness for the discretely sampled process 
{iTn)n£n- There is therefore no need to consider this question separately. 

4.4.3. Asymptotic coupling in continuous time. We now turn to the prob- 
lem of obtaining a continuous-time counterpart to our asymptotic coupling 
filter stability Theorem 4.13. To this end we will assume, as we have done 
throughout this section, that {xt,yt)t>o is a Markov additive process in the 
Polish state space E x F. In addition, we will assume in this subsection that 
the unobserved process {xt)t>o has continuous sample paths. While this is 
not absolutely essential, the restriction to continuous processes facilitates 
the treatment of asymptotic couplings in continuous time. 

The following is the main result of this section. As in Theorem 4.13, we 
will fix in the following a complete metric d for the Polish space E. 

Theorem 4.23 (Continuous weak-* filter stability). Let {xt,yt)t>o be a 
nondegenerate Markov additive process that admits an invariant probability 
X, and assume that the unobserved process {xt)t>o has continuous sample 
paths. Moreover, let d{x, y) > d{x, y) for all x,y £ E, fix A > 0, and define 
the intervals In = [nA, (n -\- 1)A]. Suppose that the following hold: 

a. There exists a > such that 



yx,x'£E 3Qee(P^,P^') s.t. Q 



y sup d{xt,x^)'^ < oo 



n=l 



> a. 



b. There exists C < oo such that for all 6 < A and ^, G C([0, 6]; E) 



I 



{^m{^-^/mW^}\5{dri)<c sup d{m.i'{t)f. 

ie[0,5] 

Then the filter is stable in the sense that 

llvrC — tt^IIbl *^°°> in Y*"' -probability for all ^,1^,7 G ?'(-£'). 
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Proof. We begin by noting that if assumption a holds for A, then this 
assumption also holds if A is replaced by A/r for some r G N. Indeed, as 

oo ^ r oo 

y^sup d{xt,Xtf > -^Y] sup d{xt,Xtf, 

t^l <e/n r te[{n+{k-l)/r)A,in+k/r)A] 

the claim follows. Fix r G N for the time being. Define E = C{[0, A/r]; E), 
F = D{[0,A/r];F), X„ = xt„^t„+i, and Yn = yt„,t„+i, where t„ = nA/r. 
Then it follows as in the proof of Theorem 4.20 that {Xn,Yn)n>o is a 
nondegenerate hidden Markov model in E x F that admits an invariant 
probability (note that the definition of E takes into account that {xt)t:>Q 
has continuous sample paths). Moreover, if we endow E with the metric 
= supigjo.A/?"] '^(^(*)' '^'(*))' then evidently the assumptions of Theo- 
rem 4.13 are satisfied. It follows that for any / G Lip(£') and fi G J'(-E') 

P-a.s. To proceed, let us fix <^ G Lip(£^), and define G(.^o,A/r) = 5(^(0)) and 
GiCoA/r) = sup5g[o,A/r-] lfi'(?(0)) - g{^{s))\. We can easily estimate 

mg{xt)\n,j-n9{xt)\n,j\ < m9{xt^-M,tj-n9ixt^-M,tj\ 

for any t G [tn-i,tn]- As G and G are d-Lipschitz, we obtain 
hmsup sup E[mg{xt)\^,J - E[g{xt)\3tJ\] 



< 2E 



sup \g{xo) - g{xs)\ 

se[0,A/r] 



On the other hand, we can estimate as in the proof of Theorem 4.20 

+ E[m9ixt)\n,tJ-n9{xt)\^,,^ 
for t G [tn-i,tn]. Applying Lemma 4.21 as in Theorem 4.20 yields 



limsupE[|7rf(g)-^t(9)|] <2E 



sup \g{xo) - g{xs)\ 

sg[0,A/r] 



But note that this holds for any r G N. Letting r — )• oo, we obtain 



TT^{g) — vrt((7)| ^"^°°> in P-probability 
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using the continuity of paths. Finahy, note that g G Lip(£') is arbitrary. 
We can therefore strengthen the convergence for individual g to \\ ■ ||bl- 
convergence as in the proof of Theorem 4.13. The proof is now easily com- 
pleted using the triangle inequality and the equivalence of all observation 
laws to PlrfV as established in the proof of Theorem 4.20. □ 

'■^0,00 

5. Examples. Infinite-dimensional Markov processes arise in a diverse 
range of applications, see, for example, the monograph [9]. The aim of this 
section is to demonstrate that the theory that we have developed in the pre- 
vious sections is directly applicable in several different settings. In section 
5.1, we consider the simplest possible example of an infinite-dimensional sys- 
tem: a stochastic heat equation with smooth forcing and point observations. 
While this example is nearly trivial, it allows us to easily demonstrate our 
results in the simplest possible setting. In section 5.2, we consider a highly 
degenerate stochastic Navier-Stokes equation with Eulerian observations. In 
section 5.3 we consider stochastic spin systems. Finally, in section 5.4 we 
consider filtering problems for stochastic differential delay equations. 

Our original motivation for developing the theory in this paper was to un- 
derstand the nonlinear filtering problem for stochastic Navier-Stokes equa- 
tions with Lagrangian observations of the position of a passive tracer, cf. [38, 
section 3.6] (such problems arise in oceanography; [38] also describes sev- 
eral other infinite-dimensional filtering problems that arise in applications). 
This example certainly fits in our general theory, the main difficulty being 
verification of the (unconditional) ergodicity properties of the passive tracer 
which have not been previously considered in the literature. In order not to 
unduly lengthen this paper, the details will be given elsewhere. As the aim 
of this section is to illustrate the wide applicability of our abstract theory, 
rather than the development of a specific application, we restrict attention 
to examples that are readily studied using existing ergodicity results. 

5.1. Stochastic heat equation. We investigate the following example from 
[28]. Consider the stochastic heat equation on the unit interval x G [0, 1]: 

du{t, z) = Au{t, z) dt + dw{t, z), u{t, 0) = u{t, 1) = 0. 

Here dw{t, z) is the white in time, smooth in space random forcing 

oo 

w{t,z) = o-fc\/2 sin(7rfcz)W/', 

k=l 

{Wl')t>o, A; G N are independent Brownian motions, J^'kLi^k ^ ^^"^ 
(7fc > for all k £ N. We will assume that u{t, z) is observed at the points 
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zi, . . . ,Zn G [0, 1] and that the observations are corrupted by independent 
white noise: that is, we introduce the R'"-valued observation model 

dyl = u{t, Zi) dt + dBl, i = 1, . . . , n, 

where {Bl)t>o, i = l,...,n are independent Brownian motions that are 
independent of {Wj')t>o, /c G N. As we are working with Dirichlet boundary 
conditions, we view z i— )• u{t, z) as taking values in the Hilbert subspace 
H C L'^[0, 1] spanned by the eigenfunctions {ek)keN, ^kiz) = \/2 sin(7r/cz). 

Lemma 5.1. Let xt = u{t,-). Then the pair {xt,yt)t>Q defines a nonde- 
generate Markov additive process in HxW^ with continuous paths. Moreover, 
the unobserved process {xt)t>o admits a unique invariant probability A. 

Proof. It is easily seen that for any n(0, •) G H, the equation for u{t, ■) 
has a unique mild solution in H that has continuous paths and satisfies the 
Markov property (cf. [9]). If we expand xt = Yl'h=i ^fc> then evidently 

dx'a = -vr^A^^Xj dt + cjfc dW^. 

By Ito's formula, we obtain 



E[||xt||^]+E 







2||xs 11^1 ds 



\xo\\h + hWut, 



where we defined the Sobolev norm ||xt||^s = X]fc^i('^^)^*(^i Note that 
\u{t,z)\ < V2Y,T=i\xt\ < ^'^^"^W^tWm by Cauchy-Schwarz. Thus z ^ 
u{t, z) is continuous for a.e. t, so the observation process yt is well defined 
and the pair (xj,yt)t>o defines a Markov additive process. Moreover, 



E 



\u{s, Zi)^ ds 







II^^oIIh- + Ikllu t 

< ^^—^ " < oo. 

6 



Therefore, by Girsanov's theorem, the conditional law of yo,t given xo,t is 
equivalent to the Wiener measure a.s. for any t < oo and xq £ H (as 
Zi))t>o and (i?^)t>o are independent, Novikov's criterion can be applied 
conditionally). This establishes the nondegeneracy assumption. Finally, as 
each Fourier mode Xj is an independent Ornstein-Uhlenbeck process, it is 
easily seen by explicit computation that the law of xt converges weakly as 
t — 7- oo to a unique Gaussian product measure A for any xq £ H. □ 



It is evident from Lemma 5.1 that the ergodic theory of u{t,z) is quite 
trivial: each of the Fourier modes is an independent ergodic one-dimensional 
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Ornstein-Uhlenbeck process (recall Example 2.3). Nonetheless, the reader 
may easily verify using the Kakutani theorem [36, p. 531] that {xt)t>o is 
not Harris when the forcing is sufficiently smooth (for example, ak = e~^^). 
Moreover, the finite-dimensional projections {x\, . . . ,x^,yt) are not Marko- 
vian. Thus stability of the corresponding nonlinear filter does not follow from 
earlier results. While this example remains essentially trivial, it is nonethe- 
less instructive to illustrate our results in this simplest possible setting. 

5.1.1. Local mixing. To every x = X^^^i a^fcCfc £ H we identify a vector 
of Fourier coefficients {xk)kef^ S M^. In order to apply our local mixing 
results, we can therefore view H as a subset of the product space M^. Note 
that H is certainly not a topological subspace of (pointwise convergence 
of the Fourier coefficients does not imply convergence in H); however, H 
is a measurable subspace of M^, which is all that is needed in the present 
setting. 

For every /c G N, define the local c-fields 



rrfc 



S <t. 



To apply Theorem 4.20, it suffices to establish the local mixing property. 



Lemma 5.2. The Markov additive process {xt,yt)t>o is locally mixing: 

IIT-.T -r-«ll t—^OD 



-7- for every x £ H, A; G N. 



Proof. Let x,x' G H, and define v £ H such that {ei,v) = {e£,x) for 
1 < i < k, and (e^, v) = (e^, x') for i > k. It is easily seen that 



rrk 



< IIP' 



rrk 



+ IIP' 



\o-{xt} 



by the Markov additive property. As the Fourier modes are independent, we 



evidently have ||P^ 



lo-{xt} 



\a{xl,...,x^} 



as f — > oo (for 



example, by explicit computation of the law of the fc-dimensional Ornstein- 
Uhlenbeck process). It therefore remains to consider the first term. 

Construct on a larger probability space (0', 3"', Q) the triple {xt, vt, yt)t>o 
as follows. The processes xt and vt are solutions to the stochastic heat equa- 
tion driven by the same Brownian motion realization, but with different 
initial conditions xq = x and vq = v, while dyl = xt{zi) dt + dBl as above. 
Now note that can show precisely as in the proof of Lemma 5.1 that 



E 



(zi) - Xs(zi)p ds 







< E 






3J« 



1 ds 



< 



\x 



\H 



< oo. 



56 



XIN THOMSON TONG AND RAMON VAN HANDEL 



As {xt,vt)t>o is independent of {Bt)t>o, we can apply Novikov's criterion 
conditionally to establish that E[At] = 1 for any t > 0, where we define 



At = exp 

i=l 



{vs{zi) - Xs{zi)} dBl - - j \vs{zi) - Xs{zi)\^ ds 



Using Girsanov's theorem, we obtain for any A ^ 



■k 

t,oo 



P'{A) = Eci[l a{vI^, ■ ■ ■ vl^,yt,oo) Ai] = Eq[1a(4oo, ■ ■ ■ , ^oo, yt,oo) At], 

where we have used that = for alH > when £ < k. Moreover, the law 
of {xt,yt)t>o under Q obviously coincides with P^. We therefore conclude 
that ||P^ - P^ll^fe < EQ[|At - 1|] ^ as t ^ OO by Scheffe's lemma. □ 

Let = cr{Pfc}, where Pf. : H ^ H is the projection onto the first k 
Fourier modes. Theorem 4.20 immediately yields the filter stability result 

ht - <ll£fe in P^-probability for ah A; G N, ^, i^,7 G TiH). 

A simple tightness argument can be used to deduce also filter stability in the 
bounded-Lipschitz norm from this statement. However, let us demonstrate 
instead how the latter can be obtained directly from Theorem 4.23. 

5.1.2. Asymptotic coupling. It was shown in Lemma 5.1 that {xt,yt)t>o 
is nondegenerate. It follows from the proof that we may choose the reference 
measure ips to be the Wiener measure on L'([0, 5]; M") and that 

9s{C,v)=T[eW / i{s,Zi)dri'{s) - - I \i{s , Zi)\'^ ds 
L^o 2 Jo 

for ^ G C([0,(5];i?)nL2([0,(5],iJ^) (for simplicity, let gsH.ri) = 1 otherwise). 
We begin by establishing the Lipschitz property of the observations. 

Lemma 5.3. For all 5 < 1 and G C{[0,6];H) 

[{V9dtv)-V9de:v)VMdv)<^ sup wm-cmm- 

Proof. The result is trivial unless G L^([0, 5]; i/^), in which case 
Now use l-e"^ <x and \^{s, Zi) - C{s, Zi)\ < S~^^^U{s) - C{s)\\hi- □ 
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Thus the second assumption of Theorem 4.23 is satisfied for d{x, y) = 
\\x — y\\ui and A = 1. It is clear that the observations cannot be continuous 
with respect to || • ||//, which is the reason that we have introduced the 
pseudodistance d in Theorem 4.23. To estabhsh filter stability, it remains to 
produce an asymptotic coupling in , which is trivial in this example. 

Lemma 5.4. For all x,x' G H, there exists Q G e(P^',P^'') such that 

oo 

sup — 11^1 < oo Q-a.s. 

„^ltG[n,n+l] 

Proof. Choose Q such that the processes xt and x'^ are solutions to the 
stochastic heat equation driven by the same Brownian motion realization, 
but with different initial conditions xq = x and '. Then 

pt = xt-x^, dpt = — TT^fc^p^ dt 

as in the proof of Lemma 5.1. As the difference pt is deterministic, the result 
follows readily (for example, ||pt||/i-i can be computed explicitly). □ 

As we have verified all the assumptions of Theorem 4.23, it follows that 

Wtt^ — vt^IIbl *^°°> in P'^-probability for all p, z^, 7 G J'(-E), 

that is, we have established filter stability in the bounded-Lipschitz norm. 

Remark 5.5. Beside that it admits a trivial ergodic theory, the example 
considered this section is special in that it is a linear Gaussian model. Li finite 
dimension, such filtering problems are amenable to explicit analysis as the 
filter reduces to the well-known Kalman filter, which is a rather simple linear 
equation [25]. Some results in this direction for linear stochastic evolution 
equations were considered by Vinter [44]. However, the present example 
does not fit in the setting of [44] as the observation operator C : — >• M", 
Cu = {u{zi), . . . ,u{zn)) is unbounded, which significantly complicates even 
the definition of the Kalman filtering equations in infinite dimension. It is 
therefore interesting to note the ease with which we have obtained stability 
results from our general nonlinear theory even in this trivial linear example. 

5.2. Stochastic Navier-Stokes equation. We now turn to a much less triv- 
ial example inspired by [38, section 3.6]: we will consider discrete time Eu- 
lerian (point) observations of the velocity of a fluid that is modeled by a 
Navier-Stokes equation with white in time, smooth in space random forcing. 
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We consider a velocity field u{t, z) € on the two-dimensional torus 
^ G T2 = [-vr, 7r]2 such that / u{t, z) dz = and V • u{t, z) = for all t > 0. 
The dynamics of u{t, z) are given by the stochastic Navier-Stokes equation 

du{t, z) = {i/An(t, z) — {u{t, z) ■ V)u{t, z) — Vp{t, z)} dt + d'w{t, z) 

with periodic boundary conditions, where > is the fluid viscosity, is a 
spatial mean zero stochastic forcing to be specified later, and the pressure p 
is chosen to enforce the divergence-free constraint V • n(t, z) = 0. 

To define the observations, let us fix points zi, . . . ,Zr € at which the 
fluid velocity is measured. We assume that measurements are taken at the 
discrete time instants t„ = nJ, n > 0, where we fix the sampling interval 
(5 > throughout this section. The observations are then given by^ 

= u{tn, Zi) + il^, i = l,...,r, n>0, 

where {£,n)n>Q are i.i.d. M^''-dimensional Gaussian random variables with 
nondegenerate covariance that are independent of {u{t, ■))t>o- 

Following [16], it will be convenient to eliminate the divergence- free con- 
straint from the stochastic Navier-Stokes equation by passing to an equiva- 
lent formulation. Define the vorticity v{t^ z) = V x u(t, z) = du^{t, z)/dz'^ — 
du'^{t, z)/dz^ , which is a scalar field on T^. As n is divergence- free and has 
spatial mean zero, we can reconstruct the velocity field from the vorticity as 
u = "Xv, where the integral operator % is defined in the Fourier domain as 
{ek,%v) = -i{k^/\k\'^){ek,v) with ek{z) = (27^)-le*'=•^ k e Z'^\{{0,0)}, and 
k^ = [k"^, —k^). In terms of vorticity, the Navier-Stokes equation reads 

dv{t, z) = {uAv{t, z) - %v{t, z) ■ Vv{t, z)} dt + dw{t, z), 

where w{t, z) = V x w{t, z), and the observation equation becomes 

= %v{tn, z,) + i = l,...,r, n>0. 

From now on we will work with the vorticity equation, which we consider as 
an evolution equation in the Hilbert space H = {v £ L^(T^) : J v{z) dz = 0}. 
This formulation is equivalent to considering the original stochastic Navier- 
Stokes equation in {u G : V ■ u = 0, J u{z) dz = 0}. We also define the 
Sobolev norm = Y2k I^P^K^fc)'^)!^ ^^'^ H'^ = {v £ H : \\v\\h'> < oo}. 

^ The observation equation makes sense when u{t, •) £ H^, as this impUes that z i-^ 
u{t, z) is continuous by the Sobolev embedding theorem. For concreteness, we can define 

— u{tn, 2:i)lu(t„, )eir^ + which makes sense for any velocity field. As we will always 
work under assumptions that ensure sufficient smoothness of the solutions of the stochastic 
Navier-Stokes equations for all t > 0, this minor point will not affect our results. 
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It remains to specify the structure of the forcing w{t, z). As in [16], we let 
Zl = Z2\{(0, 0)} = Z2_ u Z2 with = {A; G Z2 : /c2 > or A:2 = q, k'^ > 0} 
and = — Z^, and we define the trigonometric basis fk{z) = sm(k • z) for 
k £ and fk{z) = cos{k ■ z) for k G The forcing is now given by 

w{t,z) = <^kfk{z)W,\ 

where {W^)t>Q, A; G Zq are independent standard Brownian motions, and 
we win assume that J2k I^P'^fc < °° (^^ forcing is in H^). 

Lemma 5.6. Let Xn = v{tn, •)• Then {Xn,Yn)n>o defines a nondegener- 
ate hidden Markov model in H x M?"^ that admits an invariant probability. 

Proof. It is well known that the stochastic Navier-Stokes equation de- 
fines a stochastic flow, see [16, 20] and the references therein. Under our 
assumptions, this implies that the vorticity equation defines a Markov pro- 
cess in H. Thus {Xn, Yn)n>o is evidently a hidden Markov model, and non- 
degeneracy follows as the observation kernel has a nondegenerate Gaussian 
density. Moreover, as we assumed that j^l^ffc < oo, standard Sobolev es- 
timates (for example, [20, Proposition 2.4.12]) show that v{t, •) G for all 
t > a.s. for any initial condition v{0, •) G H. Thus u{t, •) = %v{t, •) G 
for all t > a.s., and the observation model is defined as intended. The 
existence of an invariant probability is standard (for example, [9, 20]). □ 

Our aim is now to establish stability of the nonlinear filter for the hidden 
Markov model (X„,l^,)„>o. This is much more difficult than for the heat 
equation in the previous section. First, in the present case the Fourier modes 
are coupled by the nonlinear term in the equation, so that energy can move 
across scales. Second, unlike in the heat equation example, only sufficiently 
fine scales are contracting. Nonetheless, in the case that all Fourier modes are 
forced (that is, > for all k G Zq), it is possible to establish local mixing 
using the Girsanov method developed in [13, 27]. In fact, the approach taken 
in these papers is well suited to our local zero-two laws (for example. Lemmas 
3.1 and 3.2 in [13] can be used directly in conjunction with Corollary 2.8 to 
establish absolute regularity of a finite number of Fourier modes, and some 
additional effort yields the assumptions of Corollary 2.5). However, these 
methods do not extend to the degenerate setting. 

We intend to illustrate that our results are applicable even in highly de- 
generate situations. To this end, we adopt the following assumptions [16]. 



60 



XIN THOMSON TONG AND RAMON VAN HANDEL 



Assumption 5.7. Let 2, = {A; G Zq : Ufc / 0} be the set of forced modes. 
We assume that (a) Z is a finite set; (b) Z = —Z; (c) there exist k,k' £ Z 
with 7^ \k'\; (d) integer hnear combinations of elements of Z generate I?. 

It was shown by Hairer and Mattingly [16] that under these (essentially 
minimal) assumptions the stochastic Navier-Stokes equation is uniquely er- 
godic. In the remainder of this section, we will show that this assumption 
also ensures stability of the filter in the bounded-Lipschitz norm. Let us 
emphasize that no new ergodic theory is needed: we will simply verify the 
assumptions of Theorem 4.13 by a direct application of the machinery de- 
veloped in [16, 17], together with a standard interpolation argument. 

We will use the following tool to construct asymptotic couplings. 

Theorem 5.8. Let Q be a transition kernel on H , and consider a con- 
tinuous function [1, oo[. Suppose that for every G C^{H) 

||VQ^(x)||h < VF(x)(ci{Q||V^||l,(x)}V2 + C2||y.||oo) 

(V denotes the Frechet derivative). Assume moreover that for some p > 1 

QW^P < ClW^P-^, AC'iCs < 1. 

Let be the law of the Markov chain (X„)„>o with transition kernel Q 
and Xq = X. Then there exists a coupling Q^'^ G C(Q^, ) such that 



\Xn - X'Jh < C2"^2-("+^) for alln>l 



1 

> - 
- 2 



whenever \\x - x'\\h < (4C2i?) ^, WP{x) < R, WP{x') < R for some R> 1. 
Moreover, the map {x, x') i— t- Q^'^ can be chosen to be measurable. 

Proof. We have simply rephrased the proofs of of Propositions 5.5 and 
4.12 in [17], making explicit choices for the constants involved. □ 

Denote by Pq{x,-) = Y*^[Xi G •] the transition kernel of (X„)„>o. To 
verify the assumptions of Theorem 5.8, we require the following deep result. 
This is the combined statement of Proposition 4.15 and Lemma A.l in [16]. 

Theorem 5.9. For every r] > and Ci > 0, there exists C2 > so that 
||VPo9'(^)||// < exp(r/||x||2,)fci{Po||Vv9||?,(x)}V2 ^C2||<^||ool 
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for all if G C^{H) and x £ H. Moreover, there exist constants r/o > and 
C3 > such that for every < r]' < r]Q, x £ H, and n > 1 we have 

E-[exp(r?'||X„||2,)] < C7| exp(??'e-^"'5||x||2,). 
Finally, we require the following reachability lemma [12, Lemma 3.1]. 
Lemma 5.10. For any i?2 > 0, there exist n > 1 and q > such that 



inf 

Mh<Ri 



\Xn\\H<R2] >q>0- 



Using these results, we can now obtain the following asymptotic coupling. 
Corollary 5.11. There exists a > such that 



yx,x'eH, 3QGe(P^,P^') s.t. Q 



n=l 



> a. 



Proof. Let W{x) = exp(77|| 

•^ll/f) with r] — (1 — e )r]Q/2, and define 
p = [1 — e~^^)~^ and Ci = I/8C3 (here r/o and C3 are as in Theorem 5.9). 
Defining C2 as in Theorem 5.9, it is easily verified that the assumptions of 
Theorem 5.8 are satisfied for Q = Pq. Therefore, for any u,u' £ H such that 
\\u\\h < R2 and \\u'\\h < R2, there exists Q"'"' G e(P",P"') such that 



sup2"||X„ - X'J\h < 00 

n>l 



1 

^2' 



where we defined the constant R2 = (2 log 2)/rjQ f\ (16(72)""'^. On the other 
hand, define the constant Ri = ^Jl + (log 2 + 2 log Ca)/^ • Then by Lemma 
5.10, there exist g > and n2 > 1 (depending on Ri and R2 only) such that 

inf P^'[||X„2k <i?2] > <7>0. 

\\x\\h<Ri 

From now on, let us fix x,x' G H. Define ni = 21og(||x||// V ||x'||h)/i^'5- 
Then E"[exp(77o||X„J|^)] < C|exp(r/o) for n = by Theorem 5.9. Using 
Chebyshev's inequality, we obtain P"[||X„J|// < > 1/2 for u = x,x'. 
We now construct the coupling Q G C(P^,P^ ) such that 



Q [^ni +n.2 ,00 ) -^rii +n2 ,00 



J 1 3^0,71-]^ +^2 



Q^"'l+'^2 '^4x4-^2 
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Setting a = q^/S (which does not depend on x, x'), it is now easily seen that 



Q 



sup2"||X„ - X'^Wh < oo 

n>l 



> a > 0. 



It remains to strengthen the || • ||/f-norm to || • Hz/i in this expression. To this 
end, we employ an interpolation argument. Recall the interpolation inequal- 

1 /2 1 /"^ 

ity < ll^ll/f ll^ll//2 (for example, [20, Property 1.1.4]). Therefore, in 

order to complete the proof, it evidently suffices to show that 



5^2'"||X„||^2 <oo 



n.=l 



1 for all u£ H. 



But as we assume that only finitely many Fourier modes are forced, we have 
E"[||^n||^2] < C(l + E"[||X„_i||^]) for some constants m > 1 and C > 
independent of n by a standard Sobolev estimate [20, Proposition 2.4.12]. 
As sup„ E"[||X„||^] < oo by Theorem 5.9, the result follows readily. □ 

We can now verify the assumptions of Theorem 4.13. Note that for any 
u G H^, we have ||m||oo ^ II'^IIh^ by the Sobolev embedding theorem. In 
particular, H^K-yHoo ^ ||^||_f/i for any v G H^. We can therefore easily compute 

j W9{x,y) - ^Jg{x' ,y)]'^ Lp{dy) < C\\x - x'\\jji for ah x,x' € H 

as in Lemma 5.3. In view of Corollary 5.11, we have verified the assumptions 
of Theorem 4.13 for d{x,y) = \\x — vWh^- We therefore conclude that 

IKn ~ ^nllBL "'^°°> in P'''-probability for all /i, 7 G J'(-E'), 

that is, we have established filter stability in the bounded-Lipschitz norm. 

5.3. Stochastic spin systems. We now turn to an example of an essen- 
tially different nature: we consider a stochastic spin system with counting 
observations (this could serve a stylized model, for example, of photocount 
data from optical observations of a chain of ions in a linear trap). In this 
setting, the unobserved process {xt)t>o describes the configuration of spins 
in one dimension; that is, Xt takes values in the space E = {0,1}^, where 
xl G {0, 1} denotes the state of spin i G Z at time t > 0. The observations 
(j/t)i>o are modeled by a counting process, so that yt takes values in F = Z+. 

To define the dynamics of {xt)t>o, we introduce a function Cj ]0, 00 [ 

for every spin i G Z. We interpret Ci{a) as the rate at which spin i flips when 
the system is in the configuration a. We will make the following assumptions. 
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Assumption 5.12. We assume the flip rates are (a) uniformly bounded: 
supj o- Ci{a) < oo; (b) finite range: Ci{a) depends only on aj, \i — j\ < R < oo; 
(c) translation invariant: Ci{a) = Ci^i{a') if aj = cr'j^i for all 

The interpretation of Ci{a) is made precise by defining the pregenerator 
^/(^) = E - /(^)} for a G i?, / G 

where a*- = Gj for j ^ i and a\ = 1 — ai and is the space of cylinder 
functions on E. Then the closure of ^ in C{E) is the generator of a Markov 
semigroup [24, Ch. Ill], and we let {xt)t>Q be the associated Markov process. 
To ensure good ergodic properties of (xj)t>o, we will assume the following. 

Assumption 5.13. The spin system {xt)t>Q is reversible with respect to 
some probability A. Moreover, the flip rates are attractive: if cr < o"', then 
we have Ci{a) < Ci{a') if cjj = o"^ = and Ci{a) > Ci{a') if ai = a[ = 1. 

It is known that under our assumptions, A is necessarily a Gibbs measure 
[24, Theorem IV. 2. 13] (so this is a stochastic Ising model). The attractive 
dynamics will tend to make neighboring spins agree; in this setting, {xt)t>Q 
admits A as its unique invariant measure [24, Theorem IV. 3. 13]. 

To define the observations, we will fix a strictly positive continuous func- 
tion /i :£'—)• ]0, oo[. The conditional law of {yt)t>o given {xt)t>o is modeled 
as an inhomogeneous Poisson process with rate Aj = h[xt). 

Lemma 5.14. The pair {xt,yt)t>o defines a nondegenerate Markov addi- 
tive process in {0, 1}^ x Z+ that admits a unique invariant probability A. 

Proof. That {xt,yt)t>o defines a Markov additive process is evident, and 
the existence of a unique invariant probability under the assumptions of this 
section was stated above. To establish nondegeneracy it suffices to note that 
as h is strictly positive, the conditional law of yo,5 given xq^s is equivalent 
to the law (ps of a unit-rate Poisson process by [25, Theorem 19.4]. □ 

We will require below the stronger assumption that the observation func- 
tion h is Lipschitz continuous with respect to a suitable metric. Note that for 
any choice of scalars > (for i G Z) such that ^ ■ < oo, the quantity 



d{a, a') = Oil^^^^/ , a,a' e E = {0, 1} 
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metrizes the product topology of {0, 1}^. We will assume throughout this 
section that h is Lipschitz with respect to d for a suitable choice of (aj)jgz- 
We now aim to establish stability of the filter. As we can naturally write 
E = Hie/ with / = Z and = {0, 1}, we are in the setting of Theorem 
4.20. To apply it, we must establish the local mixing property. To this end 
we will use two essential tools: a uniform ergodicity result due to Holley and 
Stroock [18], and the well-known Wasserstein coupling [24, section III.l]. 

Proposition 5.15. {xt,yt)t>o is locally mixing. 

Proof. Fix a point x G E and a finite subset J C Z, | J| < oo throughout 
the proof, and let 3"/^ = a{Xg -i.,ys,t}- It evidently suffices to show that 



t—>oo 



>0, 



where o S -E is the zero configuration. Let Q G C(P°, P^) be the Wasserstein 
coupling [24, section III.l]. As obviously o < x, we have [24, Theorem III. 1.5] 

xt < x'^ for all t > Q-a.s. 

To proceed, we recall a result of Holley and Stroock [18, Theorem 0.4]: 



sup {P^ix^l = 1] 

a,a'eE 



l]l<C7e-^* foralH>0 



for some constants C, 7 > 0. By translation invariance, it follows that 
sup sup iP^ixl = 1] - P'^'lxi = l]\< Ce"^* for aU t > 0. 

i€Z cr,a'£E 



Therefore, by monotonicity, 

Q[xi / xi'] = Eq[xI' - xi] < Ce"^* for ah t > 0, i€ 
Using the Lipschitz property of h, it follows easily that 

\h{xt) — h{x^)\ dt 



E, 



Q 



< 00 and E 



Q 



< 00. 



We claim that the second inequality implies that = x/' for all t sufficiently 
large Q-a.s.; we postpone the verification of this claim until the end of the 
proof. Assuming the claim, we now complete the proof of local mixing. 
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Let us extend the Wasserstein coupling Q to the triple {xt,x[,yt)t>o by 
letting {yt)t>o be an inhomogeneous Poisson process with rate At = h{xt) 
conditionally on {xt,x[)t>o- Define for any t>0 the random variable 



A. 



exp 



{log h{x'^_) - log h{xs-)} dys 



{h{x'g) — h{xs)} ds 



Applying [25, Lemma 19.6] conditionally yields E[Aj] = 1 for all t > 0. By 
the change of measure theorem for Poisson processes [25, Theorem 19.4] 

EQ[l^Koo,?/t,oo)], P"(A) =EQ[lA(xj^:^,yt,oo)At] 



P°(^) 



for any A G 3"/oo- Thus we can estimate 



>o. 



UP" - ^°hl^ < Q[<oo / <oo] + Eq[1i - A 

where the convergence follows by Scheffe's lemma and the above claim. 

It remains to prove the claim. To this end, define the stopping times 
To = and = inf{t > Tn xj = xj'} and = inf{t > : x/ / xj'} 
for n > 0. By right-continuity t!^ > Tn on {r„ < oo} for all n > 1, and 



Eq 



n=0 



'-T„<00 





" POO 




= Eq 




< oo 



Now denote by t" = mf{t > Tn : (x/,x/') / (x;?^, x;?^)}. As the Wasserstein 
coupling is itself a particle system with uniformly bounded rates, it is a 
routine exercise to verify that there exists a constant c > such that 

Eq K - r„ I ] 1^,^ <oo > Eq [r/,' - r„ | 3"^„ ] <oo > c1^„ <oo Q-a.s. 
It follows that Tn = oo eventually Q-a.s., which yields the claim. □ 

We have now verified all the assumptions of Theorem 4.20. Thus we have 



\K - -^thj in P'^-probability for all \ J\ < oo, M,z^,7 G y{E), 

where fi*^ C 'B{E) be the cylinder u-field generated by the spins J. 

Remark 5.16. The proof just given works only in one spatial dimension 
/ = Z. In a higher-dimensional lattice / = Z*^, the (unconditional) ergodic 
theory of the spin system becomes much more subtle as phase transitions 
typically appear. A Dobrushin-type sufficient condition for local mixing in 
any dimension is given by Follmer [15] for a related discrete-time model. 
With some more work, this approach can also be applied to continuous 
time spin systems in the high-temperature regime (for example, by showing 
that the requisite bounds hold for spatially truncated and time-discretized 
models, uniformly in the truncation and discretization parameters). 
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5.4. Stochastic differential delay equations. Our final example is con- 
cerned with filtering in stochastic differential delay equations. Time delays 
arise naturally in various engineering and biological applications, and the 
corresponding filtering problem has been investigated by a number of au- 
thors [44, 23, 4]. In particular, some results on filter stability for linear delay 
equations have been investigated in [44, 23] by means of the associated 
Kalman equations. We tackle here the much more difficult nonlinear case. 

Fix throughout this section a delay r S ]R_|_. Following [4], for example, 
we introduce the following nonlinear filtering model with time delay. The 
unobserved process is defined by the stochastic differential delay equation 

dx{t) = f{xt) dt + g{xt) dWt, 

where {Wt)t>o is m-dimensional Brownian motion, x{t) takes values in M", 
and we have introduced the notation xt = + s))j,g[_.r,o] ^ C([— r, 0]; M"). 
The M'^-valued observations are defined by the white noise model 

dyt = h{xt) dt + dBt, 

where {Bt)t>o is d-dimensional Brownian motion independent of {xt)t>o- 

In the following, we will exploit heavily the ergodicity results for stochastic 
delay equations established in [17]. To this end, we work under the following 
assumptions. Here and in the sequel, we denote by ||a:;|| = supjg[_^o] 
for x G C([-r, 0];M"), and by |M|2 = Tr[MM*] for any matrix M. 

Assumption 5.17. Assume (a) there exists : C([-r, 0]; M") — ^ iR'^tx" 
with ||5~"'^||oo < oo and g{x)g~^{x) = Id„ for all x; (b) / is continuous 
and bounded on bounded subsets of C([— r, 0]; M"); (c) for all x,y, we have 
2{f{x) - /(y), x(0) - y(0))+ + \g{x) - g{y)\'' + \h{x) - h{y)\'' < L\\x - y\\\ 

Under this assumption, the equation for {x{t))t>Q possesses a unique 
strong solution for any initial condition (x(t))jg[_r q] such that {xt)t>o is 
a C([— r, 0]; M")-valued strong Markov process [17]. Thus the pair {xt,yt) is 
evidently a nondegenerate Markov additive process in C([— r, 0];M") x M.'^. 

The previous assumption does not ensure the existence of an invariant 
probability. Rather than imposing explicit sufficient conditions (see, for ex- 
ample, [14, 9]), it will suffice simply to assume that such a probability exists. 

Assumption 5.18. {xt)t>o admits an invariant probability A. 

To establish stability of the filter, we will apply Theorem 4.23. To con- 
struct an asymptotic coupling, the key result that we will use is the following. 
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Theorem 5.19. For every x,x' G C([— r, 0]; M"), there exists a coupling 
Qx,x g e(p^^p^ ) such that the map (x, x') I— 7- Q^'^ is measurable and 



inf 

||x||,||a;'||<_R, 



supe \\xt 
t>o 



< oo 



/5_R > for every R < oo. 



We postpone the proof of this result to the end of this section. Let us now 
show how the result can be used to verify the assumptions of Theorem 4.23. 

We first construct the asymptotic coupling. Let us choose i? > such that 
A[||x|| < R] > 1/2. By [17, Theorem 3.7] and the Portmanteau theorem, we 
have P^[||xj|| < R]> 1/2 eventually as t — ;> oo for every x G C([-r, 0]; M"). 
Let a = /3_r/4. Given any x, x' G C([— r, 0]; R"), we now construct a coupling 
Q G e(P^, P^') as follows. First, choose s > such that P^[||xs|| < i?] > 1/2 
and P^ [ll^^sll < R] ^ 1/2. We then define the coupling Q such that 



By construction, we have 

Q 



I 3^0, a ' 



Q[x, 



0,s\ 



supe \\xt 



< oo 



> a. 



Thus we have evidently verified the first assumption of Theorem 4.23 for 



d{x, x') 



\x — X 



and A = 1 (for example). On the other hand, the second 



assumption follows easily as in Lemma 5.3, as we have assumed the Lipschitz 
property of h. Thus we have verified the assumptions of Theorem 4.23, so 

IK-<||bl^^O in PT-probability for all ^, i/, 7 G a'(C([-r, 0]; M")), 

that is, we have established filter stability in the bounded-Lipschitz norm. 

It remains to prove Theorem 5.19. This is a direct extension of the proof 
of Theorem 3.1 in [17]; we finish the section by sketching the necessary steps. 

Proof of Theorem 5.19. In the proof of [17, Theorem 3.1] a kernel 
(x,x') I— 7- Q^'^ was constructed on 17 x with the following properties. 
First, there exists a constant 7 > independent of x, x' such that 



supe \\xt 



< 00 



>7 for all x,x' G C7([-r,0];M"). 



Second, there is a Q^'^ -Brownian motion {Wt)t>o and an adapted process 



{zt)t>o that satisfies |zj [^ (it < C||x 



x 



/||2 



-a.s. such that 



Q"'"'[xo,oo G ^] = P^X^), Q"'"'[1a(x[,,oo)A] = P"'(^) 
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for every measurable set A, where A is the Girsanov density 

ztdWt - 

'0 



A = exp 



/•oo -1 i^c 

/ ztdWt-- \zt\'dt 
Jo ^ Jo 



Let R^'^' G e(Q^'^',P^') be the couphng maximizing R^'^'[xq = Xq^^]. It 
is classical that 211^'^' [x'q ^ / Xq ^] = ||Q^'^'[3;o oo ^ ' ] ~ ^^^^ that the 
maximal coupling can be chosen to be measurable in x,x' (by the existence 
of a measurable version of the Radon-Nikodym density between kernels [10, 
V.58]). As {ztl"^ dt < CWx — x'W^ Q^'^ -a.s., we can chose 6 > sufficiently 
small that R^'^ [^^o^oo / ^o,oo] ^7/2 whenever ||x — < 5. Then evidently 



R^ 



sup e*||xt — x"\\ < oo 

t>0 



> — whenever llx — x'\\ < 6. 



Now define for any x,x' G C([— r, 0];M") the measure Q^'^ such that 
Then Q^'^' G e(P^,P^'), {x,x') i-)- Q^'^' is measurable, and 



inf Q^'^ 

||x||,||a;'||<_R 



sup e*||xf — x^ll < oo 

t>0 



>-{ inf 

~ 2 V \\x\\<R 



\x2r\\ < 



It remains to note that the right-hand side is positive by [17, Lemma 3.8]. □ 
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